extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
184 stars 27 forks source link

Generated function name collisions #21

Closed Ilia-Kosenkov closed 3 years ago

Ilia-Kosenkov commented 3 years ago

I would like to understand what is the intended behavior if multiple Rust functions with the same name are emitted. Consider the following scenario:

  1. Generate first function named rust_inc
    rust_function("fn rust_inc(x : f64) -> f64 { x + 1f64 }")
  2. Do the same but change the body:
    rust_function("fn rust_inc(x : f64) -> f64 { x + 2f64 }")

Expected behavior -- calling rust_inc(5) returns 7, actual behavior -- it returns 6. The reason is in the wrapper, which uses .Call("wrap__rust_inc", ...).

Assuming these two functions are the first compiled in R session, the corresponding lib names are rextendr1 and rextendr2. As a result,

.Call("wrap__rust_inc", 5, PACKAGE = "rextendr1")
# > 6
.Call("wrap__rust_inc", 5, PACKAGE = "rextendr2")
# > 7

Would it be possible to explicitly specify the library that exports the wrapper function to avoid such collisions? This behavior is counter-intuitive to what R does, where a new function body can be bound to an old name:

r_inc <- function(x) x + 1
r_inc(5)
# > 6
r_inc <- function(x) x + 2
r_inc(5)
# > 7

UPD: This seems to be Windows-related

Ilia-Kosenkov commented 3 years ago

For comparison, Rcpp does it right:

cppFunction("double cpp_inc(double x) { return x + 1; }")
cpp_inc(5)
# > 6
cppFunction("double cpp_inc(double x) { return x + 2; }")
cpp_inc(5)
# > 7
Ilia-Kosenkov commented 3 years ago

What about getNativeSymbolInfo? Can it be used instead to obtain the correct pointer to an exported function?

Ilia-Kosenkov commented 3 years ago

R generated code seems to be different on Windows and in WSL: For Windows, it is

function(...)
     .Call("wrap__rust_inc", ...)

For Ubuntu (WSL2)

function(x)
     .Call("wrap__rust_inc", x)

As a result, Windows functions can be abused:

rust_inc(5, PACKAGE = "rextendr2")
# > 7

On Windows, rextendr emits a separate .R file under temp\R\rextendr.R, which contains an exported method. On WSL, the respective folder is empty. The exports are in turn taken from temp\target\extendr_wrappers.R present both on Windows and on Ubuntu,

clauswilke commented 3 years ago

I have been struggling with the same problem on OS X, that R would always find the first symbol created, not the most recent one. Even unloading the library didn't help in some cases. That's why I started changing the package name ("rextendr1", "rextendr2", etc), that helped on OS X. The PACKAGE argument on the other hand doesn't work on my system.

There's something Rcpp does that I'm not quite understanding yet, they manage to pull functions out of specific packages.

FYI: The generated code changed yesterday when I merged the PR by @CGMossa. I assume you're running an older version on Windows.

library(rextendr)

rust_function("fn rust_inc(x : f64) -> f64 { x + 1f64 }")
#> build directory: /var/folders/b1/13gn4j655jddkfhxmtk5tsfm0000gn/T//RtmpJa6z6Y/file123785bb0301f
rust_inc(5)
#> [1] 6
rust_function("fn rust_inc(x : f64) -> f64 { x + 2f64 }")
#> build directory: /var/folders/b1/13gn4j655jddkfhxmtk5tsfm0000gn/T//RtmpJa6z6Y/file123785bb0301f
rust_inc(5)
#> [1] 7

.Call("wrap__rust_inc", 5, PACKAGE = "rextendr1")
#> Error in .Call("wrap__rust_inc", 5, PACKAGE = "rextendr1"): "wrap__rust_inc" not available for .Call() for package "rextendr1"
.Call("wrap__rust_inc", 5, PACKAGE = "rextendr2")
#> Error in .Call("wrap__rust_inc", 5, PACKAGE = "rextendr2"): "wrap__rust_inc" not available for .Call() for package "rextendr2"

Created on 2021-01-12 by the reprex package (v0.3.0)

Ilia-Kosenkov commented 3 years ago

@clauswilke, I have both Windows and WSL versions installed from local folder (commit c21e0d347a7b9886892f97dd79aa5642f73e459b). I use remotes::install_local(force = TRUE). I am still able to reproduce this behavior using .Call with different PACKAGE args. I also noticed that if after the first function rust_inc is not called but rather another rust function is created, than it hooks the second one, completely ignoring the first, i.e.

library(rextendr)
rust_function("fn rust_inc(x : f64) -> f64 { x + 1f64 }")
rust_function("fn rust_inc(x : f64) -> f64 { x + 2f64 }")
rust_inc(5)
# > 7

But if I continue, the result becomes incorrect

rust_function("fn rust_inc(x : f64) -> f64 { x + 3f64 }")
rust_inc(5)
# > 7

Yet

 .Call(getNativeSymbolInfo("wrap__rust_inc", PACKAGE = "rextendr3"), 5)
# > 8
clauswilke commented 3 years ago

The wrappers are now generated by Rust code in the extendr-macros crate. Maybe you have different versions of those on the two machines, due to a Cargo.lock file?

You can inspect the generated wrappers by going into the build directory and looking at extendr_wrappers.R:

clauswilke /var/folders/b1/13gn4j655jddkfhxmtk5tsfm0000gn/T/RtmpPmiKui/file122e78f1f6dd/target> cat extendr_wrappers.R 

#===========
#THIS FILE IS AUTOGENERATED VIA #[extendr] MACROS
#DO NOT ATTEMPT TO EDIT IT BY HAND
#===========

rust_inc <- function(x) {
    .Call("wrap__rust_inc", x)
}
Ilia-Kosenkov commented 3 years ago

@clauswilke, Indeed there was some misconfiguration, the wrappers are correct, however, it does not solve the original problem (for me).

What I dislike is that we rely on a string to identify what function to call. Perhaps we can deparse what is going on in Rcpp and adopt it as well? When I inspect Rcpp function, the internal .Call references a <pointer> rather than a string.

clauswilke commented 3 years ago

It's possible that Rcpp calls the R routine registration, similar to what the extendr module macro does. I haven't investigated this carefully, and I'm not entirely sure how one would call the registration function from R, but it may be possible.

Feel free to dig into this. I'm super busy the next two weeks so may go silent, but don't hesitate to post questions and I'll answer them if and when I can.

Ilia-Kosenkov commented 3 years ago

@clauswilke, Ok, I will ask around about extendr-macros to see if I can achieve something. Thanks.

Ilia-Kosenkov commented 3 years ago

This reprex represents a broader problem:

library(rextendr)
rust_source(code = "#[extendr]\n fn rust_fn1() -> i32 { 1i32 }\n#[extendr]\n fn rust_fn2() -> i32 { 2i32 }")
#> build directory: C:\Users\[redacted]\AppData\Local\Temp\RtmpiItydQ\file310013655b9f
rust_fn1()
#> [1] 1
rust_fn2()
#> [1] 2
rust_source(code = "#[extendr]\n fn rust_fn2() -> i32 { 20i32 }\n#[extendr]\n fn rust_fn3() -> i32 { 30i32 }")
#> build directory: C:\Users\[redacted]\AppData\Local\Temp\RtmpiItydQ\file310013655b9f
rust_fn2()
#> [1] 2
rust_fn3()
#> [1] 30

Created on 2021-01-12 by the reprex package (v0.3.0)

Sadly, it works on Linux, but not on Windows.

Ilia-Kosenkov commented 3 years ago
remotes::install_github("Ilia-Kosenkov/rextendr", ref = "dyn-unload", dependencies = FALSE, force = TRUE)
library(rextendr)
rust_source(code = "#[extendr]\n fn rust_fn1() -> i32 { 1i32 }\n#[extendr]\n fn rust_fn2() -> i32 { 2i32 }")
#> build directory: C:\Users\[redacted]\AppData\Local\Temp\RtmpgdeAUK\file7e042aa6060
rust_fn1()
#> [1] 1
rust_fn2()
#> [1] 2
rust_source(code = "#[extendr]\n fn rust_fn2() -> i32 { 20i32 }\n#[extendr]\n fn rust_fn3() -> i32 { 30i32 }")
#> build directory: C:\Users\[redacted]\AppData\Local\Temp\RtmpgdeAUK\file7e042aa6060
rust_fn2()
#> [1] 20
rust_fn3()
#> [1] 30

I overwrote extendr_wrappers.R in plain text, injecting PACKAGE = rextendr (see https://github.com/Ilia-Kosenkov/rextendr/blob/1b7df7b0b5750ea4f03806b458adf95584fc5514/R/source.R#L157-L162), to achieve something like the following:

rust_fn2
function()
     .Call("wrap__rust_fn2", PACKAGE = "rextendr2")

This is no solution but at least it shows what is required. It obviously relies heavily on the fact that .Call is a one-liner and arguments can be captured with .* pattern. However, such invasion on rextendr level allows avoiding reworking of other packages.

clauswilke commented 3 years ago

If you want to investigate how an existing package does this, I suggest looking into the cpp11 package, which is a little simpler than Rcpp. Here is the relevant code: https://github.com/r-lib/cpp11/blob/969c2d5b01e088cb38372f68900a2129251a1e1b/R/source.R#L115-L127

CGMossa commented 3 years ago

I've tried to investigate this thoroughly. I now firmly believe that both rcpp and cpp11 use the PACKAGE argument that @Ilia-Kosenkov is filling out manually. We should just do the same in rexendr.

clauswilke commented 3 years ago

I now firmly believe that both rcpp and cpp11 use the PACKAGE argument

That is correct. However, one more time: As the rextendr code is written right now, this does not work on macOS. Rcpp and cpp11 must be doing something else that rextendr is not currently doing. It's possible that it's routine registration. One way to test would be to build a shared library with routine registration code and see whether that code gets automatically called upon dyn.load().

Ilia-Kosenkov commented 3 years ago
cpp11::cpp_function("double cpp_inc(double x) {return x + 10; }")
print(cpp_inc)
#> function(x) {
#>   .Call("_code_0_cpp_inc", x, PACKAGE = "code_0")
#> }

Created on 2021-01-13 by the reprex package (v0.3.0)

Here is what I get from cpp11 -- the PACKAGE is set to code_0, the name of the generated dll. Also, notice dll-specific wrapper prefixes, which we currently cannot achieve.

@clauswilke, Could you test Ilia-Kosenkov/rextendr@dyn-unload branch on Mac? I checked that my approach works on Windows and (some) Linux, so there is a chance.

clauswilke commented 3 years ago

Here you go:

library(rextendr)

rust_function("fn rust_inc(x : f64) -> f64 { x + 1f64 }")
#> build directory: /var/folders/b1/13gn4j655jddkfhxmtk5tsfm0000gn/T//Rtmp65GVuR/file12fbe67610285
rust_inc(5)
#> Error in .Call("wrap__rust_inc", x, PACKAGE = "librextendr1"): "wrap__rust_inc" not available for .Call() for package "librextendr1"

Created on 2021-01-13 by the reprex package (v0.3.0)

Ilia-Kosenkov commented 3 years ago

@clauswilke, Thank you. I believe this is a naming issue. I will setup a CI and see if I can fix this for all OS's.

Ilia-Kosenkov commented 3 years ago

Ok I solved MacOS problem. The explanation is simple: each system expects different library names.

Instead of guessing the name, we can retrieve the output of dyn.load, which contains a correct library name in [["name"]] member. If this name is then injected into the wrapper, it works like a charm. https://github.com/Ilia-Kosenkov/rextendr/actions/runs/482737747

clauswilke commented 3 years ago

Glad you figured it out. I'm not sure how I feel about rewriting the wrapper scripts, but maybe that's the best solution.

Ilia-Kosenkov commented 3 years ago

@clauswilke, It is definitely not but at least it is a proof of concept suggesting the direction. In cpp11 a standalone package decor is used to extract cpp11 attributes from cpp source code and generate wrappers. We do not have such a possibility, nor do I think parsing Rust is a good idea.

clauswilke commented 3 years ago

It makes sense for the Rust compiler to do the Rust parsing, as we're currently doing. So I think that's the right design choice. But instead of writing the wrapper scripts directly from Rust, it may make more sense to write a .csv file or similar with function names, arguments, etc.

Ilia-Kosenkov commented 3 years ago

@clauswilke, This is out of my league, but if something like this will be produced by extendr-macros, we can easily consume it here. So far I would like to suggest to implement a bypass to enable rextendr on all platforms (which will allow integration tests on github CI).

clauswilke commented 3 years ago

The relevant code is here: https://github.com/extendr/extendr/blob/master/extendr-macros/src/output_r.rs It just requires some changes to the output.

But maybe @hobofan has time to tackle this, since he wrote the code originally. It would be good to make this change before the extendr 0.2.0 release.

clauswilke commented 3 years ago

In any way, please implement however it works for now.