extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
184 stars 27 forks source link

Can we generate extendr-wrappers.R at the same time of compiling Rust code? #56

Closed yutannihilation closed 3 years ago

yutannihilation commented 3 years ago

As described on helloextendr's README, the current steps to produce the wrapper are:

# Compile the Rust code (If you are using RStudio, just run "Install and Restart")
devtools::install()

# Re-generate wrappers
rextendr::register_extendr()

# Re-generate documentation and NAMESPACE (If you are using RStudio, just run "Document")
devtools::document()

But, I come to wonder if explicit rextendr::register_extendr() is really needed. Can we generate it on compilation by tweaking some build.rs? (Or Makevars?)

yutannihilation commented 3 years ago

by tweaking some build.rs?

Apparently, it cannot because build.rs is what is executed before compilation.

But, Makevars works. It's just creating main.rs like this and do cargo run in Makevars. Can we switch to this?

fn main() {
    let metadata = helloextendr::get_helloextendr_metadata();
    print!(
        "{}",
        metadata.make_r_wrappers(true, "helloextendr").unwrap()
    );
}

full commit: https://github.com/yutannihilation/helloextendr/commit/e556d81e3dd846d9f51f3ed5636a1cb931632f8d

yutannihilation commented 3 years ago

@clauswilke @andy-thomason @Ilia-Kosenkov Sorry I think I didn't follow the discussion on the wrapper-generation completely, but are there any reason the wrapper generation needs to be called from R session? The Makevars-method I described above seems to work, and I'm wondering if there's something I misunderstand.

clauswilke commented 3 years ago

Yes, your Makevars method should work also. It's just a matter of preference. Do you want more boilerplate Rust code or do you want an extra R call?

Also, have you tested your method on Windows? Didn't we have problems with cargo run there or has that been solved? And on Windows, we'll generate the wrappers twice, but maybe that's not a big deal.

yutannihilation commented 3 years ago

It's just a matter of preference. Do you want more boilerplate Rust code or do you want an extra R call?

I don't think this is just my preference. This matters because if this works the current steps:

# Compile the Rust code (If you are using RStudio, just run "Install and Restart")
devtools::install()

# Re-generate wrappers
rextendr::register_extendr()

# Re-generate documentation and NAMESPACE (If you are using RStudio, just run "Document")
devtools::document()

can be reduced to just one step (currently, pkgbuild::needs_compile() doesn't consider .rs files, but I think it's possible as it's just add .rs here: https://github.com/r-lib/pkgbuild/blob/d9717aad6ff0534846b36a827e8444174dc83a1a/R/compile-dll.R#L115):

devtools::document()

The problem with cargo run is a good point, thanks. I guess we can just cargo build and execute the binary, but I need to test it.

yutannihilation commented 3 years ago

Ah, just remembered the problem with cargo run happens when it tries to cross-compile for 32 bit, so I guess it won't be a problem. Probably we can just skip it on 32 bit Windows because it's only needed on development, not on installation. But anyway I'll check it.

clauswilke commented 3 years ago

One other issue to think about: Do we have potential problems with the path where the Makevars approach is executed? register_extendr() works in the package source directory, by definition. The Makevars approach works in the build directory. Are there workflows where that is not the package directory? (E.g., if people don't use devtools()? Do such people exist?)

clauswilke commented 3 years ago

One last issue: We have had discussions about more complex wrapper code, see here: https://github.com/extendr/extendr/issues/184 If we move the generation of the wrapper code into Makevars then we need to generate this code on the Rust side. I'm still not sure I like this idea.

One other thought: Things are a bit awkward with the current devtools::document(), but is that in part because it doesn't know anything about rextendr? Could we modify devtools::document() so it calls register_extendr() when needed? Doesn't it do the equivalent for cpp11?

clauswilke commented 3 years ago

Would like to get @dfalbel's perspective also.

yutannihilation commented 3 years ago

Do we have potential problems with the path where the Makevars approach is executed?

I don't think there's any problem as it's simple as just redirecting stdout to R/extendr-wrappers.R. If the user is skilled enough to rearrange the package structure other than the one use_extendr() creates, they should figure out what to do.

(E.g., if people don't use devtools()? Do such people exist?)

Sorry, I don't get the point here. Why does the use of devtools() matter?

If we move the generation of the wrapper code into Makevars then we need to generate this code on the Rust side. I'm still not sure I like this idea.

Sorry, I don't understand this point either... In either way, the wrapper code is generated on Rust's side. The difference is how <Metadata>.make_r_wrappers() is called, directly by the executable or via the C API exposed to R. Am I wrong?

Things are a bit awkward with the current devtools::document(), but is that in part because it doesn't know anything about rextendr? Could we modify devtools::document() so it calls register_extendr() when needed? Doesn't it do the equivalent for cpp11?

No, I'd say they are different in that register_cpp11() works without installing or loading the package.

I would emphasize this point. The current rextendr's workflow has a chicken-egg problem; if R/extendr-wrapper.R is accidentally lost, the user cannot recover. To generate the wrapper, we need at least useDynLib declared, but it would be declared in R/extendr-wrapper.R. The Makevars approach can avoid this; what the user needs to do is just compiling.

If we want "the equivalent for cpp11", I guess we need to build another infrastructure for parsing the Rust code from R's side, just like decor package does for cpp11: https://github.com/r-lib/decor. This would be a very different approach than the current one.

clauswilke commented 3 years ago

Sorry, I don't get the point here. Why does the use of devtools() matter?

If you follow the instructions in the first section of this post, for example, I don't think the wrappers would end up in the right location: https://kbroman.org/pkg_primer/pages/build.html

In either way, the wrapper code is generated on Rust's side.

We're talking about more wrapper code that doesn't exist yet. It's not a given that it will be generated on the Rust side. There is no reason that it has to, I believe.

yutannihilation commented 3 years ago

R CMD build and R CMD INSTALL, or devtools::build() and devtools::install(), are the steps to install the package, not to develop it. The developer need to run devtools::document() to generate wrappers, anyway. (So, I think we cannot support those who don't use devtools. c.f. usethis::use_cpp11() fails if the user doesn't use roxygen to generate documents).

There is no reason that it has to, I believe.

Yes, I agree with you here. So, are you arguing that it might be easier to tweak the generated R code programmatically if we do it in register_extedr()? I feel we can do the similar thing by Makevars approach, but this seems a good point. Thanks.

clauswilke commented 3 years ago

are you arguing that it might be easier to tweak the generated R code programmatically if we do it in register_extendr()?

Yes, I've never been that excited about having compiled Rust code spit out R code.

I'm aware of the chicken-and-egg problem and I don't like it but I also think we shouldn't parse Rust code in R if we can avoid it.

Do you know why devtools::document() compiles the Rust source code? It seems to me that since this happens there should be a way to take advantage of it, possibly with some modification to that function.

I think more generally we should take a bigger picture perspective and not worry so much about how things work with the current devtools::document() but how they could work if we propose some modifications there. If we could somehow register a function that is run after the package is compiled but before the documentation is generated that is all we need.

clauswilke commented 3 years ago

I think this is the explanation for why the package gets compiled: https://roxygen2.r-lib.org/reference/load.html Maybe there's a way to piggy-back on this process.

yutannihilation commented 3 years ago

Okay, I think I'm getting your point...

If we could somehow register a function that is run after the package is compiled but before the documentation is generated that is all we need.

I think it's here, which is called inside load_all() after compile_dll(). Currently it only supports cpp11 and Rcpp, but I guess we can propose some option to insert arbitrary functions here.

https://github.com/r-lib/pkgbuild/blob/2aaecf838eb023b788b7a31e267ac644d616dae0/R/c-registration.R#L1-L12

clauswilke commented 3 years ago

We could certainly write a function rextendr::document() that makes sure extendr-wrapper.R exists or otherwise adds a minimal version, then compiles the package and generates the wrappers, and then documents. If we're clever we may need only one compilation of the source code, since the compiled code doesn't change with a new wrapper file.

I think all of this comes down to mostly writing a new load function that does the following three steps:

  1. pkgload::load_all()
  2. rextendr::register_extendr()
  3. source the newly generated extendr wrappers so they overwrite the old code loaded with load_all()
clauswilke commented 3 years ago

I typed out my comment while you typed yours. Yes, modifying update_registration() in pkgbuild may be the way to go.

clauswilke commented 3 years ago

Actually, not so fast. I think update_registration() is called before compilation, but we need something that gets called after.

https://github.com/r-lib/pkgbuild/blob/d9717aad6ff0534846b36a827e8444174dc83a1a/R/compile-dll.R#L37-L54

yutannihilation commented 3 years ago

Ah, true. Hmm...

clauswilke commented 3 years ago

Maybe it's my long-term use of LaTeX, but I'm totally comfortable with keeping it at this for now:

devtools::install(quick = TRUE)
rextendr::register_extendr()
devtools::document()

And register_extendr() could have an optional feature that forces a minimal extendr-wrappers.R if none exists, so that recovery of a broken package would work as follows:

rextendr::register_extendr(force_wrappers = TRUE)
devtools::install(quick = TRUE)
rextendr::register_extendr()
devtools::document()

Then, the next step might be to implement rextendr::document() that does the above more efficiently, and finally we could go to the devtools developers and ask if they're willing to modify their code to accommodate us.

yutannihilation commented 3 years ago

Hmm, okay. I feel I'll probably end up generating the wrappers in onLoad(), but I'll try to stick with it for a while.

c.f. https://stackoverflow.com/a/32035219

yutannihilation commented 3 years ago

Reflect quick = TRUE to helloextendr's README: https://github.com/extendr/helloextendr/commit/8fd66d983665ebfd24639e30d0340cd46e6efe37

yutannihilation commented 3 years ago

I'm not fully convinced, but I also feel I probably cannot convince other members with my idea. I'm closing this and will file a new one for rextendr::document(). Let's stick with the current steps with manual rextendr::register_extendr() for now. Sorry for the noise.

clauswilke commented 3 years ago

I'm going to reopen this issue. I'm now quite convinced that there is no reliable way that an R function can build a package, load the dynamic library, and then call a function from it. This sequence of steps is only reliable if R gets restarted in-between, and that's not something an R function can (or should) trigger. Without restart, there's always the risk that the shared library doesn't get updated.

So, we need to look for other alternatives. Building the wrappers during compile is attractive. However, I can also now better formulate what was bothering me about the proposed approach: I don't think a compile should overwrite the R package sources. That's just too much magic.

Therefore, how about the following: What if the compile just generates the wrappers and leaves them in target, using the technique you proposed. Then, register_extendr() can fish the wrappers out from there and place them into R/extendr-wrappers.R. And, if register_extendr() doesn't find any wrappers in target, it could alternatively try to call the shared library and get wrappers that way. This makes the whole sequence more robust in case somebody doesn't want to build the wrappers during compile, for whatever reason.

yutannihilation commented 3 years ago

I don't think a compile should overwrite the R package sources. That's just too much magic.

Thinking about this issue again, I now agree with this part.

Yet another alternative might be to call cargo run (or compiled executable binary) from register_extendr(). but fishing the wrapper is probably generally useful (e.g. when the developer needs to tweak the generated wrapper).

clauswilke commented 3 years ago

Yet another alternative might be to call cargo run (or compiled executable binary) from register_extendr().

I thought about it. In theory it's possible to generate the main.rs code on the fly and compile and run the binary when register_extendr() is called. In practice though it's probably too complicated, and may interfere with a main.rs the developer has placed there for some other reason. I think it's better to set this up as part of the template structure and have it run with every build.

yutannihilation commented 3 years ago

To be clear, I think it doesn't need to be main.rs. We can use bin/<whatever name>.rs and do cargo run --bin <whatever name>. But anyway, in that case, probably we need to place generated <whatever name>.rs into bin/ and then remove it, which sounds unsafe. So I have no objection.

c.f. https://doc.rust-lang.org/cargo/guide/project-layout.html#package-layout

.
├── Cargo.lock
├── Cargo.toml
├── src/
│   ├── lib.rs
│   ├── main.rs
│   └── bin/
│       ├── named-executable.rs
│       ├── another-executable.rs
│       └── multi-file-executable/
│           ├── main.rs
│           └── some_module.rs
...snip...
yutannihilation commented 3 years ago

Will you implement this?

clauswilke commented 3 years ago

It's unlikely I'd get to it this week. So if you have time and want to give it a try please go ahead.

yutannihilation commented 3 years ago

Sure, I'll try a PR.

yutannihilation commented 3 years ago

We explored this option on #63, but we found it doesn't look nicer than rextendr::document() considering the complexity.