extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
184 stars 27 forks source link

rextendr::document() #59

Closed yutannihilation closed 3 years ago

yutannihilation commented 3 years ago

(The original discussion: https://github.com/extendr/rextendr/issues/56#issuecomment-792186856)

Currently, we need to execute 3 lines of codes to reflect the changes on the Rust code. We might need some shortcut for this.

devtools::install(quick = TRUE)
rextendr::register_extendr()
devtools::document()
clauswilke commented 3 years ago

The most elegant way to develop a shortcut is to call roxygen2::roxygenise() directly, with a custom load function that updates the wrappers and sources them after compiling and before documenting.

clauswilke commented 3 years ago

If we copy the code from devtools::document() we run into a licensing issue as it's GPL licensed.

@jimhester @hadley Do we have permission to use the body of devtools::document() under MIT license? Specifically these lines: https://github.com/r-lib/devtools/blob/4a55464dee9cb99b97f7e9bb6c098e533ab1761f/R/document.R#L13-L35

clauswilke commented 3 years ago

In theory, something like the following should work. In practice, though, I can't get this to use the newly compiled dll on macOS. It looks like the same problem we had previously with rust_source().

load_rextendr <- function(path) {
  pkgbuild::clean_dll(path = path)
  env <- pkgload::load_all(path, helpers = FALSE, attach_testthat = FALSE)$env
  eval(bquote(rextendr::register_extendr(.(path))), envir = env)
  wrappers <- rprojroot::find_package_root_file("R", "extendr-wrappers.R", path = path)
  source(wrappers, local = env)
  env
}

document <- function(pkg = ".", roclets = NULL, quiet = FALSE) {
  pkg <- devtools::as.package(pkg)
  if (!isTRUE(quiet)) {
    cli::cli_alert_info("Updating {.pkg {pkg$package}} documentation")
  }

  devtools:::save_all() # not currently exported

  if (quiet) {
    output <- tempfile()
    on.exit(unlink(output))
    withr::local_output_sink(output)
  }
  withr::local_envvar(devtools::r_env_vars())

  roxygen2::roxygenise(pkg$path, roclets, load_code = load_rextendr)
  pkgload::dev_topic_index_reset(pkg$package)
  invisible()
}
hadley commented 3 years ago

Yes, that's fine — I think you can consider the majority of devtools to be MIT licensed, we just haven't rewritten the one function we copied from base R yet, so we can't consider the whole package to be MIT.

jimhester commented 3 years ago

An alternative solution you might consider is to generate this code in a custom roxygen roclet. I am pretty sure you can automatically include custom roclets by adding something to your Roxygen: field in the DESCRIPTION. So after this initial setup everything would just work and users would not have to call any custom functions or require any changes in devtools and friends.

clauswilke commented 3 years ago

Thanks @hadley and @jimhester. I'll have to look into roclets. I know nothing about them. The main problem we have is that we need to compile the shared library, and then load it and call a function from it before running the actual documentation step. That's because the shared library contains meta information about the compiled code that needs to make its way into the R documentation. (Very different from how Rcpp or cpp11 work, but makes sense for Rust.)

Ilia-Kosenkov commented 3 years ago

@clauswilke, I made a prototype of a roclet, and it indeed allows to inject custom code into roxygenize() process. However, I faced a number of issues:

  1. If roclet is provided in Roxygen: field of DESCRIPTION, it overrides default roclets, which are responsible for e.g. modification of NAMESPACE. I don't know how to bypass this other than specifying them by hand.
  2. Roclets are called after library compilation, but this creates a problem: the very first time roxygenize() is called, NAMESPACE is not updated, so no useDynLib directive, so even if the developed package is loaded, its respective native library is not, so no .Call suceeds.

Solution:

  1. We need to manually write NAMESPACE for the very first time (if the file has no useDynLib)
  2. We need to manually write Roxygen: list(roclets = c(...)) and either include all default roxygen2 roclets or find a way to ask for defaults in addition to our custom roclet

With that in mind, here is package development routine I was able to test:

  1. devtools::create()
  2. rextendr::use_extendr()
  3. Temporary step: amend NAMESPACE and DESCRIPTION to include new roclet
  4. devtools::document() The last step produces everything, including new wrappers, loads the package so that hello_world() can be called. The problem is, the .Rd files are not updated correctly if roxygen2 tags were modified in Rust. This simply requires running devtools::document() twice in a row.

I am sure there is a lot to consider here (and a lot of testing to be done for sure), but it seems to be feasible. Thank you to @jimhester for introducing roclets to me, I had no idea roxygen2 can be extended almost effortlessly.

clauswilke commented 3 years ago

@Ilia-Kosenkov I don't think this approach is going to work, see here: https://github.com/extendr/rextendr/issues/56#issuecomment-794809998

In a nutshell, as far as I can tell, you can't ever be certain that the dynamic library has been updated after a load_all(). This is OS dependent, so it may work on your system but it doesn't work on macOS. I need to restart R to make sure I have the most recently compiled library version.

Therefore, @yutannihilation's idea of building the wrappers during compile by compiling and executing a tiny Rust binary seems the better approach.

Ilia-Kosenkov commented 3 years ago

@clauswilke, I have no objections to your idea, just was curious if roxygen2 can be (ab-)used to solve this problem. I find it confusing that a full restart of R session is required to update library -- I was under impression it can be solved by dyn.unload/devtools::unload(). Speaking of this, the Makevars we ship has no dependency on Rust code or its Cargo.toml, only on the product, libwhatever.a. If you change Rust code, nothing will be recompiled if a library already exists in the /src folder, which is also problematic for the developer.

Personally, I would suggest generating wrappers from R via calling into Rust library. The library can always be loaded/unloaded just for the sake of the call to make_wrappers.

clauswilke commented 3 years ago

The library can always be loaded/unloaded just for the sake of the call to make_wrappers.

I have not been able to do so reliably on macOS. It's possible somebody else can figure this out. But, the documentation of dyn.unload() contains this sentence: "Note that unloading a DLL and then re-loading a DLL of the same name may or may not work"

Ilia-Kosenkov commented 3 years ago

@clauswilke, I see. It's unfortunate we can't do it the easy way. Let's hope wrapper generation from Rust side will work as we want.

yutannihilation commented 3 years ago

@Ilia-Kosenkov

Speaking of this, the Makevars we ship has no dependency on Rust code or its Cargo.toml, only on the product, libwhatever.a. If you change Rust code, nothing will be recompiled if a library already exists in the /src folder, which is also problematic for the developer.

Regarding this point, I think pkgbuild can be improved. (c.f. https://github.com/r-lib/pkgbuild/issues/115), but do you think we need to improve Makevars as well?

Ilia-Kosenkov commented 3 years ago

@yutannihilation, I was unable to. I put Rust sources as dependencies in Makevars (assuming it beaves as a reaguar makefile), but it did not track changes. Perhaps I am doing something wrong. If we can force R build system to track changes in Rust files, that would be a substantial improvement. Otherwise, I forsee a lot of issues arising from the fact that a user has updated their Rust code but did not observe any changes on R side as devtools::document() did not rebuild the library.

yutannihilation commented 3 years ago

In my understanding, devtools has its own build system and mechanism to detect changes (i.e. pkgbuild:::needs_update()). Makevars matters when doing standard install, but in that case probably make clean is executed so recompilation happens.

Anyway, if this is really a problem, maybe we should switch to use cargo run? https://github.com/extendr/rextendr/issues/56#issuecomment-794848692

clauswilke commented 3 years ago

With the current setup, you need to minimally run pkgbuild::clean_dll() before compiling. It clears out the compiled library and any .o files, but it doesn't clear out Rust's target, so a recompile is fast.

Ilia-Kosenkov commented 3 years ago

@clauswilke, @yutannihilation, thank you for the explanation, I was unaware of these implementation details. Perhaps indeed Rust-based wrapper generation is the way to go.

yutannihilation commented 3 years ago

@clauswilke @Ilia-Kosenkov I slightly updated my PoC implementation; now Makevars has generate_wrappers so we can both specify it as a dependency of $(SHLIB), and run make generate_wrappers from rextendr::register_extendr() to ensure the wrapper is up-to-date. What do you think?

https://github.com/extendr/helloextendr/compare/main...yutannihilation:poc/generate-wrappers-in-Makevars

clauswilke commented 3 years ago

Can we actually call make generate_wrappers from an R function? Is there a standardized API for that?

One other issue: I wouldn't call the wrappers dir ./rust/wrappers. This requires the Rust source to be located in rust, and some projects may not want that. (I'm using sinab here and it makes sense for the project: https://github.com/clauswilke/sinab/tree/master/src) How about writing the wrappers into ./extendr-wrappers?

yutannihilation commented 3 years ago

I simply meant system2("make", c("-C", "src", "-f", "Makevars", "generate_wrappers")). Do you have any concern? Note that, I'm proposing this as a temporary workaround until pkgbuild become aware of Rust-related files. This might execute cargo run twice (on compile and on executing register_extendr()), but I think it's okay as Cargo is smart enough not to compile the same code twice.

For the location of the wrapper, sounds good to me.

yutannihilation commented 3 years ago

In this case, it seems we should use Makefile instead of Makevars.

A few packages use the src directory for purposes other than making a shared object (e.g. to create executables). Such packages should have files src/Makefile and src/Makefile.win (unless intended for only Unix-alikes or only Windows). (https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-subdirectories)

clauswilke commented 3 years ago

I think the text you quote is to be understood differently. If a project doesn't have a shared lib at all, then it shouldn't use Makevars. But we do have a shared lib, and we also have an additional target that sometimes we want to call separately. Therefore, I think it should be Makevars.

Also, one thing to keep in mind: We're not just writing one package here, we're developing a general framework that possibly will be copied by hundreds of packages. We should try to deviate as little as possible from standard build files in R.

clauswilke commented 3 years ago

Also, related to this thought: Not sure rextendr::register_extendr() should call make with a particular target. What if the package user has deleted that target? Will we get some weird, indecipherable error message? Maybe it's better to rebuild the shared library and expect that wrappers will be generated as a side effect.

yutannihilation commented 3 years ago

Aw, I misunderstood it, thanks... Let's use Makevars.

I get your point, but is that really that weird to require a separate target for generating wrappers, rather than considering it as a side effect?

clauswilke commented 3 years ago

Why don't you implement it the way you envision and then we see how it works and what happens when the target is deleted.

yutannihilation commented 3 years ago

Sure. I'll create a pull request. Let's discuss there later.

jimhester commented 3 years ago

@Ilia-Kosenkov you can get the default roclets with roxygen2::roxy_meta_get("roclets"), so something like Roxygen: list(roclets = c(rextendr::my_roclet, roxygen2::roxy_meta_get("roclets"))) should run the rextendr roclet along with the default roclets.

Ilia-Kosenkov commented 3 years ago

Current state

After experimenting with @yutannihilation in #63, I can try to summarize what we need (or what we are missing) to make devtools::document() work:

  1. The first issue is the built process, as devtools cares only about cpp/fortran files, as seen in pkgbuild:::sources and pkgbuild:::headers. https://github.com/r-lib/pkgbuild/blob/d9717aad6ff0534846b36a827e8444174dc83a1a/R/compile-dll.R#L113-L146 Any changes to *rs or *toml files, which should trigger recompilation, are not tracked.
  2. Then there is wrapper generation. In #63 @yutannihilation explored the possibility of invoking Rust binary to obtain wrappers. We found some problems with directly invoking make, including the fact that on Windows make may be absent from PATH and only added by the rtools during compilation, and that simple make of Makevars does not load any R specific configurations found in global Makevars file, which creates all sorts of problems. Even with this approach, we do not significantly simplify package prototyping and additional method calls are required.
  3. @clauswilke reports problems with loading/unloading the library on MacOS, which complicates the wrapper generation from R side. I am unable to reproduce it on other platforms, but perhaps I may be able to observe this issue on CI. This problem blocks any R-side wrapper generation.

What can be done

Here I assume we are unable to PR Rust-related stuff in other packages (like pkgbuild). I will once again attempt to argue in favor of roclets.

  1. We can write a roclet that does something similar to what pkgbuild does with sources (like make does). If Rust files have been modified, we can touch (as in e.g. fs::file_touch) something that is tracked by pkgbuild, e.g. Makevars. This will trigger recompilation.
  2. register_extendr() can also be called from the roclet, which I have prototyped. Here we face the library loading/unloading problem, which can be unreliable on some platforms (according to docs). My argument is that in any case we rely on unreliable. either with make or with library loading, so perhaps we should support at least 95% of scenarios instead of 100%. Just my opinion.
  3. The challenge here is that roxygen2 processes roclets independently from each other and the source compilation process. In a hypothetical scenario when someone modified Rust code in such a way that new documentation should be produced, first document() will touch source, second call to document() will trigger recompilation and new wrappers will be emitted and only the third call to document() will read new wrappers and update NAMESPACE/.Rd accordingly. On the bright side, we can track this changes and output messages that extra call to document() is requried, and all package prototyping is reduced to calls to document(), with no extra functions to generate wrappers by hand. The roclet import mechanism makes this approach entirely optional -- we can have rextendr::use_extendr(enable_roclets = FALSE) to support this scenario.

In the ideal world

We could at least try to PR into pkgbuild, asking to support Rust files tracking recompilation. This will allow us to retire any recompilation roclet that we (may) export (if this decision is ever made).

PS: @jimhester, thank you again, I was looking for this. Either I am really bad at googling this/reading pkg docs, or roxygen2 really lacks a detailed description of its configuration.

yutannihilation commented 3 years ago

Thanks for summarizing.

problems with loading/unloading the library

For this part, maybe we can just invoke another R process? I made a simple pull request here: #64

Ilia-Kosenkov commented 3 years ago

@yutannihilation, to make the discussion easier, I have a prototype here https://github.com/Ilia-Kosenkov/rextendr/tree/roxygen-integration The CI result is here https://github.com/Ilia-Kosenkov/rextendr/actions/runs/643917414

In short, using rextendr::use_extendr(use_roclets = TRUE) creates all the necessary infrastructure. Then, only devtools::document() is required to update everything.

I think I will prototype also file tracking roclet to bypass {pkgbuild} limitations.

clauswilke commented 3 years ago

@Ilia-Kosenkov Can you turn this into a draft PR, so we can more easily see the changes made?

Ilia-Kosenkov commented 3 years ago

I suggest closing this issue because #67 has been merged.