extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
186 stars 27 forks source link

Provide configure and configure.win to check the Rust toolchain #134

Closed yutannihilation closed 1 year ago

yutannihilation commented 3 years ago

c.f. https://github.com/cran/gifski/blob/master/configure

yutannihilation commented 2 years ago

Now I think I've figured out what these should be. But..., it feels a bit too massive.

https://github.com/yutannihilation/string2path/blob/main/configure.win https://github.com/yutannihilation/string2path/blob/main/configure

clauswilke commented 2 years ago

Yes, it's absurd that every package using cargo is supposed to ship and maintain this amount of code. It would b nice if R/CRAN could in some form support Rust extension packages. Not sure what needs to happen for that to become a possibility.

yutannihilation commented 2 years ago

Some lines can be omitted, but I guess this is the reality... I'll write a blog post about this so that we can discuss what options might be acceptable to us. Hopefully soon.

yutannihilation commented 2 years ago

I'll write a blog post about this so that we can discuss

I wrote, but it seems I still need some more time to figure out how to move this issue forward... This is unnecessarily lengthy so I don't really recommend to read, but l post here for reference only.

https://yutani.rbind.io/post/2021-09-21-writing-a-configure-script-for-an-r-package-using-rust/

Ilia-Kosenkov commented 2 years ago

If we are to keep this, we need to provide tooling for developers for upgrading this script if the new binary is released.

Another thing is I'd prefer to have various configurations in DESCRIPTION file (like roxygen), but I doubt we can read values from there during compilation.

yutannihilation commented 2 years ago

upgrading this script if the new binary is released.

A simple solution would be defining the tag and the expected hash in a separate file e.g. tools/hashes.sh. The configure scripts can source it so that the developers don't need to update the whole configure and configure.win.

Another thing is I'd prefer to have various configurations in DESCRIPTION file (like roxygen), but I doubt we can read values from there during compilation.

Interesting idea, thanks. I think "${RSCRIPT}" -e 'read.dcf("./DESCRIPTION")' should work. What kind of configuration do you want?

yutannihilation commented 2 years ago

I managed to squash most of the actual logics into one file and settings.

Now configure is only this long, and the developers don't need to update this usually. This is not yet very portable because of the undocumented convention on the locations of precompiled binary, but this feels a bit better now.

# In configure scripts, Rscript cannot be used without path specified
RSCRIPT="${R_HOME}/bin/Rscript"

# Load the variables (e.g. github repo, tag, and checksum)
. ./tools/vars.sh

# Load common functions (e.g. check_cargo)
. ./tools/common_functions.sh

echo "NOT_CRAN: ${NOT_CRAN}"

# Even when `cargo` is on `PATH`, `rustc` might not. This should ensure PATH is
# configured correctly (c.f. https://github.com/yutannihilation/string2path/issues/4).
if [ -e "${HOME}/.cargo/env" ]; then
  . "${HOME}/.cargo/env"
fi

# Check if `cargo` command exists ----------------------------------------------

check_cargo

# If cargo is confirmed fine, exit here. But, even if the cargo is not available
# or too old, it's not the end of the world. There might be a pre-compiled
# binary for the platform.
if [ $? -eq 0 ]; then
    sed -e 's|@CLEAN_EXTRA@|$(STATLIB)|' src/Makevars.in > src/Makevars
    exit 0
fi

# Download the precompiled binary ----------------------------------------------

download_binaries

sed -e 's|@CLEAN_EXTRA@||' src/Makevars.in > src/Makevars
exit 0
Ilia-Kosenkov commented 2 years ago
Interesting idea, thanks. I think "${RSCRIPT}" -e 'read.dcf("./DESCRIPTION")' should work. What kind of configuration do you want?

Sorry I forgot to answer. I was thinking if it is possible to put all the metadata about repository name, version, checksums in the description file? I understand we can have several binaries per platform and it may bloat the file, if all of the checksums are in the DESCRIPTION file.

Ilia-Kosenkov commented 2 years ago

read.dcf seems to work fine. One possibility is that we list all the variables that need to be set (e.g., your vars.sh). DESCRIPTION of a package can be easily accessed, so consumers can verify what are they actually installing. Your common_functions.sh looks insane, so many things have to be done. I wonder if we can somehow reduce it in size.

A wild idea: AFAIK dependencies are installed before package compilation, what if we make a package or add code to {rextendr} that does all of this? Then the only thing needed is just calling a function from the package, like {RSCRIPT} -e "rextendr::assist_compilation()", plus perhaps an optional parameter (like a path to DESCRIPTION or package root).

yutannihilation commented 2 years ago

what if we make a package or add code to {rextendr} that does all of this?

Yeah, I too wondered if we can go in this direction. It sounds good to me. A bonus is that the functions can be tested in usual testthat tests.

One side-effect is that we are going to be responsible for the dependent packages. We might need to be more careful to release new versions. I think it's a matter of time, though.

Ilia-Kosenkov commented 2 years ago

Well, we are responsible for maintaining these build scripts that you use in your package and suggest implementing here. My issue is it is much, much harder to maintain and test shell-ish scripts (especially cross-platform) than working with R code.

What I propose is if someone scaffolds a new package using rextendr, we put rextendr as dependency, write additional info into Rextendr field of DESCRIPTION and then use Rscript to call a method from rextendr, which should be available on the system before compilation (because it is a dependency). I am 95% sure CRAN will scream at us for doing such tricks, but perhaps it is worth trying out. Also, we may even save on scripts and inject this step into Makevars -- it will be just one line.

yutannihilation commented 2 years ago

Ah, yes, we are. Sorry, I wasn't very clear. I'll later clarify about my concern about maintaining the backward-compatibility.

For using DEDCRIPTION, I don't worry much. Do you have any particular CRAN policy violation in your mind?

Ilia-Kosenkov commented 2 years ago

No, but it is CRAN. Anything unorthodox can trigger a bad response in my opinion. As an argument -- I honestly have only seen Roxygen using its own field in the DESCRIPTION. I would definitely check if we can pass R CMD check --as-cran with additional fields first.

yutannihilation commented 2 years ago

I agree to some extent, but, this time, I believe this usage of DESCRIPTION is legal. Writing R Extensions says:

There is no restriction on the use of other fields not mentioned here (but using other capitalizations of these field names would cause confusion). Fields Note, Contact (for contacting the authors/developers) and MailingList are in common use. Some repositories (including CRAN and R-forge) add their own fields.

So there's nothing that stops us to add our own fields, at least. For example, ggplot2 has uncommon Config/Needs/website, which is a convention of pkgdown.

It's still unclear if there's some rule on referring to the package's own file in configure script, but I think there's nothing.

yutannihilation commented 2 years ago

Regarding our responsibility to maintain extendr, I think there are some possible degrees of guaranty that we can provide.

  1. At the very least, we make sure the latest release version of extendr compiles and works without issues. If some major issue is found there, we'll fix it and release the new version.
  2. Also, we should be able to guarantee the code that currently compiles and works fine will be fine in future, as long as the user don't change the extendr's version. This should be fine as far as Cargo.lock can guarantee.
  3. If we would go further, we could guarantee the backward-compatibility. i.e. we'll never change any of our APIs so that the code currently compiles and works will be fine forever even if the user update their version of extendr.

I think we all agree with option 1, and disagree that we are at the stage of option 3.

How about option 2? As I wrote above, I think it's possible if the user's package solely depends on Rust crates extendr-*. The crate versions can be pinned in Cargo.toml. But, in reality, an R package depends on

We are almost powerless on the change of the R API, but we can do better with the build scripts. We should not break the scripts that currently work fine. This means, if we make the scripts depend on rextendr package, it might be very difficult to make breaking changes on such functions like rextendr::assist_compilation(). I'm fine either it's a shell script or an R script, but, for now, it should be vendored to avoid the frequent breakage.

yutannihilation commented 2 years ago

But, for now, such a package should be only my package. So, I'm not that strongly against the idea.

yutannihilation commented 2 years ago

I found it a bit difficult to put everything in R script, partly because I stick with zero-dependencies and I wanted an extra check of verifying the checksums (and of course my poor skill of R code writing), but I guess the main reason is the absence of nice way of communicating errors between R and shell script. If we want to handle the errors differently and nicely, we'll have a hard time, like I did...

settings: https://github.com/yutannihilation/string2path/blob/af28a0b89234cc5ae7ec91855015a3cc141fab6d/DESCRIPTION#L29-L41 code: https://github.com/yutannihilation/string2path/blob/main/tools/configure.R

Anyway, while the code is messy, I think I'm ready to submit my package to CRAN soon to see if this is accepted by them. Are there anything we want to try?

Ilia-Kosenkov commented 2 years ago

Looks better to me. I am not entirely sure about the structure of Fields in DESCRIPTION (especially naming), but we can review it later after your submission passes all the checks (good luck btw).

yutannihilation commented 2 years ago

Thanks. For naming, it seems some r-lib packages use Config/foo/bar fields like testthat's Config/testthat/edition, so I followed this manner. It will probably never conflicts with other package's convention if the field contains the package name.