reuse cargo build directory between runs of R #291

Open aavogt opened 1 year ago

aavogt commented 1 year ago

My code loaded with rust_source does not compile because I have not studied from_robj yet. This makes R exit:

Caused by error in `invoke_cargo()`:
! Rust code could not be compiled successfully. Aborting.
✖ error[E0599]: no function or associated item named `from_robj` found for reference `&ArrayBase<OwnedRepr<i32>, Dim<[usize; 2]>>` in the current scope
 --> src/lib.rs:2:1
2 | #[extendr]
  | ^^^^^^^^^^ function or associated item not found in `&ArrayBase<OwnedRepr<i32>, Dim<[usize; 2]>>`
  = note: this error originates in the attribute macro `extendr` (in Nightly builds, run with -Z macro-backtrace for more info)

✖ error: aborting due to previous error

This process takes 20s because all rust dependencies are recompiled. If the cargo project stays in the same location, subsequent calls to rust_source take 2s. Here is an ugly way to get the cargo project to always be in $PWD/connect_build:

  # duplicates most of rextendr:::get_build_dir
  rlang::env_bind(rextendr:::the, build_dir = {
        b <- file.path(getwd(),"connect_build")
        if (!dir.exists(b)) {
                dir.create(file.path(b, "src"))
                dir.create(file.path(b, "R"))
                dir.create(file.path(b, ".cargo"))
  rust_source("connect.rs", features="ndarray")

Could you change rust_source to make the above code shorter. Perhaps it could look like:

rust_source("connect.rs", cache_build="connect_build")
rust_source("connect.rs") # connect_build comes from connect.rs
Ilia-Kosenkov commented 1 year ago

By default we cache the build if cache_build = TRUE https://github.com/extendr/rextendr/blob/1a43843b191536ab057949b1c40ca3e9e9949b90/R/source.R#L38C1-L39C31

We do not support different caches for different calls to rust_source() though, since it would be really hard to track it.

Check what happens in your case if instead you read the contents of connect.rs into a string and then pass it as code to rust_source():

rextendr::rust_source(code = connect_rs_contents, ...)
aavogt commented 1 year ago

I need to use code= for #234. cache_build works within one R session only:

# a.R
rs <- function(file="lib.rs") {                          
  rextendr::rust_source(code=paste0(readLines(file), collapse="\n"), features="ndarray", env=parent.frame())
rs() # always slow
rs() # always fast
R --vanilla -q -s < a.R
ℹ build directory: /tmp/RtmpXeLmcd/file37b52d703be358
    Updating crates.io index
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.63
   Compiling libR-sys v0.4.0
   Compiling unicode-ident v1.0.9
   Compiling quote v1.0.28
   Compiling num-traits v0.2.15
   Compiling matrixmultiply v0.3.7
   Compiling num-integer v0.1.45
   Compiling syn v1.0.109
   Compiling paste v1.0.12
   Compiling extendr-engine v0.4.0
   Compiling rawpointer v0.2.1
   Compiling num-complex v0.4.3
   Compiling extendr-api v0.4.0
   Compiling ndarray v0.15.6
   Compiling lazy_static v1.4.0
   Compiling extendr-macros v0.4.0
   Compiling rextendr1 v0.0.1 (/tmp/RtmpXeLmcd/file37b52d703be358)
    Finished dev [unoptimized + debuginfo] target(s) in 17.98s
✔ Writing /tmp/RtmpXeLmcd/file37b52d703be358/target/extendr_wrappers.R
ℹ build directory: /tmp/RtmpXeLmcd/file37b52d703be358
   Compiling rextendr2 v0.0.1 (/tmp/RtmpXeLmcd/file37b52d703be358)
    Finished dev [unoptimized + debuginfo] target(s) in 0.82s
✔ Writing /tmp/RtmpXeLmcd/file37b52d703be358/target/extendr_wrappers.R

# run it again and work is duplicated because there is a new build directory
R --vanilla -q -s < a.R
ℹ build directory: /tmp/RtmpANhNDD/file37c0c52dbd9ae8
    Updating crates.io index
   Compiling autocfg v1.1.0
   Compiling libR-sys v0.4.0
   Compiling proc-macro2 v1.0.63
   Compiling quote v1.0.28
   Compiling unicode-ident v1.0.9
   Compiling num-traits v0.2.15
   Compiling matrixmultiply v0.3.7
   Compiling num-integer v0.1.45
   Compiling syn v1.0.109
   Compiling rawpointer v0.2.1
   Compiling extendr-engine v0.4.0
   Compiling paste v1.0.12
   Compiling num-complex v0.4.3
   Compiling extendr-api v0.4.0
   Compiling ndarray v0.15.6
   Compiling lazy_static v1.4.0
   Compiling extendr-macros v0.4.0
   Compiling rextendr1 v0.0.1 (/tmp/RtmpANhNDD/file37c0c52dbd9ae8)
    Finished dev [unoptimized + debuginfo] target(s) in 19.48s
✔ Writing /tmp/RtmpANhNDD/file37c0c52dbd9ae8/target/extendr_wrappers.R
ℹ build directory: /tmp/RtmpANhNDD/file37c0c52dbd9ae8
   Compiling rextendr2 v0.0.1 (/tmp/RtmpANhNDD/file37c0c52dbd9ae8)
    Finished dev [unoptimized + debuginfo] target(s) in 0.85s
✔ Writing /tmp/RtmpANhNDD/file37c0c52dbd9ae8/target/extendr_wrappers.R
# b.R
rs <- function(file="lib.rs") {                          
    rlang::env_bind(rextendr:::the, build_dir = {
        b <- file.path(getwd(),glue("{str_remove(file, '.rs')}_build")) 
        if (!dir.exists(b)) {
                dir.create(file.path(b, "src"))
                dir.create(file.path(b, "R"))
                dir.create(file.path(b, ".cargo"))
  rextendr::rust_source(code=paste0(readLines(file), collapse="\n"), features="ndarray", env=parent.frame())
rs() # usually fast
rs() # always fast
# b.R looks like a.R at first:
rm -rf lib_build
R --vanilla -q -s < b.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
    Updating crates.io index
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.63
   Compiling libR-sys v0.4.0
   Compiling unicode-ident v1.0.9
   Compiling quote v1.0.28
   Compiling num-traits v0.2.15
   Compiling matrixmultiply v0.3.7
   Compiling num-integer v0.1.45
   Compiling syn v1.0.109
   Compiling extendr-engine v0.4.0
   Compiling paste v1.0.12
   Compiling rawpointer v0.2.1
   Compiling num-complex v0.4.3
   Compiling extendr-api v0.4.0
   Compiling ndarray v0.15.6
   Compiling lazy_static v1.4.0
   Compiling extendr-macros v0.4.0
   Compiling rextendr1 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 18.17s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
   Compiling rextendr2 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 0.81s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R

# but subsequent runs are fast
R --vanilla -q -s < b.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
   Compiling rextendr1 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 0.64s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
   Compiling rextendr2 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 0.68s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R

So far I use one rs file and one cache. I do not want different caches for different calls.

Ilia-Kosenkov commented 1 year ago

So most of rextendr::rust_*() functions are designed for interactive experimentation, we did not aim at supporting cross-session caching of cargo artifacts. At this point, I am not sure we should.

What is your scenario that you actually rely on such usage of {rextendr} interactive compilation?

aavogt commented 1 year ago

In closest_pairs.R I source rust_source.R to replace rust_source(). In one terminal I have while true; do timeout 5 R --vanilla -s < closest_pairs.R || echo "timed out"; inotifywait -e modify lib.rs closest_pairs.R; done. After saving changes to either file, I wait a few seconds and then I either see the error in the terminal, or okular reloads the plots.

Usually I use Nvim-R to edit a Rmd file. But after making a change to lib.rs, I save it, switch to the .Rmd file and then I have to send the right chunks in the right order to R.

The rs/Nvim-R/Rmd workflow is even worse if I have to restart R. An infinite loop on the rust side can't be broken by a C-c sent to R. In that case I have to type C-a x in the window with R inside tmux. Then I switch back to nvim and send commands ,rq ,rf gg gn ,cd which starts a new R and sends the first chunk. Then I wait for evaluation to finish. I repeat ,cd and waiting until I get to the chunk that produces the plot. Compared with the inotifywait loop, there are many more keystrokes, some of which I have to pause in between. Therefore with rs/Nvim-R/Rmd, I have less time, attention and short term memory left over for changes to my code.

Once closest_pairs.R is complete, I move the code to a chunk in main.Rmd. The tibble from the closest_pairs chunk then makes many plots each in its own chunk. If I discover something that needs a change to lib.rs I move code back to a .R file and run it with the inotifywait loop above.

Ideally, Nvim-R's should support {rextendr} Rmd chunks and it should have a command to recursively source a chunk's dependencies. That would reduce but not eliminate my need for the inotifywait method.

Ilia-Kosenkov commented 1 year ago

Oh, that is a very complex setup, and specific to your workflow, I believe. I am not entirely convinced we should explore something like this. {rextendr} serves two purposes:

Our experience is not tailored for constantly executing Rust chunks in fresh R sessions, and I expect to hit all sorts of weird issues if we try to implement this. I'll share this issue on our Discord to get more feedback from other maintainers.

sorhawell commented 1 year ago

I only have experience with extendr via package builds e.g helloextendr using rextendr::document() or R CMD install ...

It is possible to use symlink files and dirs to achieve caching on temporary file structures.

if you look inside temp folder after a build you might find e.g. a rust/target or myobjectfile.a where all the compiled objects are. Before a tempoary build you can symlink a previous file or folder.

r-polars uses symlink to speed up compiling in girhub actions and in development to circumventR CMD check creates tempoary folders

I sometimes work on extendr project in multiple forks which are cloned independently. Then I use symlink to save disk space.