extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
181 stars 27 forks source link

A function or script to generate `LICENSE.note` #236

Closed eitsupi closed 1 year ago

eitsupi commented 1 year ago

As detailed in this article https://yutani.rbind.io/post/rust-and-cran-repository-policy/#doesnt-describe-the-authorship-and-copyright by @yutannihilation, it may be necessary to generate a B listing the authors and licenses of Rust's dependent packages to comply with the CRAN policy. Even if we do not consider the CRAN policy, it is sometimes important to be explicit about the source code contained in the binaries if we intend to distribute them somewhere.

So can we include a function or script in the rextendr to generate the LICENSE.note?

I generated a LICENSE.note in prqlr with the following script and it was accepted by CRAN^1. https://github.com/eitsupi/prqlr/blob/380b092c100cdf6a9590aceadd388d677cd62502/dev/generate-license-note.R

manifest_path <- file.path("src", "rust", "Cargo.toml")
license_note_path <- file.path("LICENSE.note")

note_header <- paste0(
  "The binary compiled from the source code of this package contains the following Rust packages.\n",
  "\n",
  "\n",
  "-------------------------------------------------------------"
)

package_name <- RcppTOML::parseTOML(manifest_path)$package$name

df_license <- processx::run(
  "cargo",
  c("license", "--authors", "--tsv", "--avoid-build-deps", "--manifest-path", manifest_path)
)$stdout |>
  data.table::fread() |>
  dplyr::select(name, repository, authors, license) |>
  dplyr::filter(name != package_name) |>
  dplyr::mutate(
    authors = dplyr::case_when(
      authors == "" ~ paste0(name, " authors"),
      TRUE ~ authors
    ) |>
      stringr::str_remove_all(r"(\ <.+?>)") |>
      stringr::str_replace_all(r"(\|)", ", ")
  )

license_note <- df_license |>
  purrr::pmap_chr(
    \(name, repository, authors, license) {
      paste0(
        "\n",
        "Name:        ", name, "\n",
        "Repository:  ", repository, "\n",
        "Authors:     ", authors, "\n",
        "License:     ", license, "\n",
        "\n",
        "-------------------------------------------------------------"
      )
    }
  )

c(note_header, license_note) |>
  writeLines(license_note_path)

I used cargo-lisence to make this easier, but I think in @yutannihilation's string2path uses a Rust-independent script.

Ilia-Kosenkov commented 1 year ago

A couple of questions:

  1. What is the exact format of the LICENSE.note file?
  2. Do we want to generate this file by default upon repository setup?
  3. Do we want to automatically re-generate this file if dependencies change (if yes, how do you detect new dependencies?)?
eitsupi commented 1 year ago

Thanks for your response.

1. What is the exact format of the LICENSE.note file?

I believe that there is no format.

2. Do we want to generate this file by default upon repository setup?

No, I think the user needs to explicitly execute a function or script to generate it.

3. Do we want to automatically re-generate this file if dependencies change (if yes, how do you detect new dependencies?)?

To do this, I think we need a mechanism to cache Cargo.lock and detect changes, so I don't think rextendr needs to take care of that. For example, it is the user's responsibility to generate README.md from README.Rmd, so pkgdown do not update README.md automatically.

Ilia-Kosenkov commented 1 year ago

Maybe we can store license info as some sort of structured output like JSON/YAML? In case this needs to be indexed on CRAN/elsewhere.

yutannihilation commented 1 year ago

OTOH, r-rust/faq (i.e., Jeroen) says:

I don't think the package description should name authors of 3rd party things which your package depends on, or links to, or downloads, if that source code is not included with the source package that is on hosted CRAN.

https://github.com/r-rust/faq#should-i-mention-authors-of-3rd-party-cargo-crates-in-the-descriptionlicense

I use LICENSE.note because I bundle the source code, but I think it's not the case with ordinary rextendr users.

(I don't mean I necessarily agree with this take.)

eitsupi commented 1 year ago

Certainly we who submit source code do not bundle binaries, but in fact CRAN and others distribute binaries, so I don't think it is strange to include notes on the source code needed to generate those binaries. (Of course, I am not a legal expert and do not want to enforce this.)

clauswilke commented 1 year ago

I think it's important to maintain some common-sense perspective (which the CRAN people don't always have, one of the reasons I have been less active in the entire R space recently). It's certainly a good rule to say "name all the people whose source code you distribute in your package". It's not a good rule to say "name all the people whose code gets somehow pulled in once you compile the binary." Otherwise, authors of C or C++ code would have to acknowledge all the authors of the standard libraries that get pulled in when you compile the code, and nobody does that. (People who use Rcpp don't list the Rcpp authors as authors of their own packages, but Rcpp template code definitely makes it into compiled packages that depend on Rcpp.)

yutannihilation commented 1 year ago

In defense of eitsupi, LICENSE.note is about license, not about authorship, and license actually matters to some extent. We do not need to manually care about licenses as long as a package manager tracks it, but when we go across package managers (i.e. R and cargo), we might need some workaround, and LICENSE.note is the one, in my understanding.

I was just afraid that the discussion on this would be bikeshed-y as no one knows the absolute answer.

eitsupi commented 1 year ago

Sorry, I think I did a bad job of creating my issue.

My understanding is that it is common in the Rust community to pay attention to the licenses of dependent Rust packages, which is why tools like cargo-deny, etc. are used.

I am not trying to force it on others, I just thought that rextendr could provide an appropriate tool for such people. Users are free to use it or not.

eitsupi commented 1 year ago

FYI, I noticed that https://github.com/apache/arrow-datafusion-python has a similar script. https://github.com/apache/arrow-datafusion-python/blob/5cab64eb2ee186d501ab60c640f995ebd492f6d2/dev/create_license.py It seems to be used to binary packages. https://github.com/apache/arrow-datafusion-python/blob/5cab64eb2ee186d501ab60c640f995ebd492f6d2/.github/workflows/build.yml#L36-L37

eitsupi commented 1 year ago

A version with fewer dependencies. (If you add this to rextendr, only RcppTOML is an additional dependency)

manifest_path <- file.path("src", "rust", "Cargo.toml")
license_note_path <- file.path("LICENSE.note")

note_header <- paste0(
  "The binary compiled from the source code of this package contains the following Rust packages.\n",
  "\n",
  "\n",
  "-------------------------------------------------------------"
)

package_name <- RcppTOML::parseTOML(manifest_path)$package$name

list_license <- processx::run(
  "cargo",
  c(
    "license",
    "--authors",
    "--json",
    "--avoid-build-deps",
    "--avoid-dev-deps",
    "--manifest-path", manifest_path
  )
)$stdout |>
  jsonlite::parse_json()

.prep_authors <- function(authors, package) {
  ifelse(!is.null(authors), authors, paste0(package, " authors")) |>
    gsub(r"(\ <.+?>)", "", x = _) |>
    gsub(r"(\|)", ", ", x = _)
}

license_note <- list_license |>
  purrr::keep(\(x) x$name != package_name) |>
  purrr::map_chr(
    \(x) {
      paste0(
        "\n",
        "Name:        ", x$name, "\n",
        "Repository:  ", x$repository, "\n",
        "Authors:     ", .prep_authors(x$authors, x$name), "\n",
        "License:     ", x$license, "\n",
        "\n",
        "-------------------------------------------------------------"
      )
    }
  )

c(note_header, license_note) |>
  writeLines(license_note_path)
JosiahParry commented 1 year ago

FWIW, I'm strongly in favor of automating a LICENSE.note. There are a few parts to this, I think.

  1. It is the duty of the R developer to include, in a LICENSE.note file the licenses of the bundled compiled software that is used. See chapter 13.4.1 in R Packages. See example file.
  2. extendr doesn't add license to Cargo.toml by default. However, it could be that the rust library crate inside of an R package is licensed differently than the R package itself. That should be documented.
  3. It would be beneficial to have a compatibility check but that could be out of scope.
  4. If a LICENSE.note file exists it should not be overwritten.
JosiahParry commented 1 year ago

It looks like cargo-license would be a great tool to use here. https://github.com/onur/cargo-license Though it looks like rextendr is a pure R package that doesn't use Rust so that may be out of the picture.

eitsupi commented 1 year ago

4. If a LICENSE.note file exists it should not be overwritten.

If Cargo.toml is updated, LICENSE.note must also be updated. Perhaps the automatic update can be controlled by a field in the DESCRIPTION file (something like Config/rextendr/LicensenoteUpdate: true).

JosiahParry commented 1 year ago

@eitsupi would this be part of rextendr::document()?

eitsupi commented 1 year ago

would this be part of rextendr::document()?

I think it makes sense since that is the only function the user uses on a routine basis.