Closed apsteinmetz closed 4 months ago
I can reproduce, thanks for the reprex. I'll look into it
Looks like this is exactly the issue described in this section of the future
docs: https://future.futureverse.org/articles/future-4-issues.html#missing-packages-false-negatives
Using furrr_options()
to specify the packages used in the function should work:
future_map(1:n, do_stuff, .options = furrr_options(packages = "tidypolars"))
However, there's another issue in polars
that crashes R when handling a list of DataFrame
s created with future
. I reported it here: https://github.com/pola-rs/r-polars/issues/851
There's nothing more I can do in tidypolars
to fix this, so I'm closing. Thanks for the report
Alas, when I explicitly attach the package with future_map(1:n, do_stuff,.options = furrr_options(packages = c("tidypolars","polars")))
, things get even more interesting. R crashes with the "R Session Aborted" bomb. BUT, when I omit library(tidyverse)
, so the only dplyr-ish verbs come from tidypolars
, I am back to the ! could not find function "arrange"
error. I recognize this is probably a future
package issue but it's interesting.
when I omit library(tidyverse), so the only dplyr-ish verbs come from tidypolars , I am back to the ! could not find function "arrange" error
This isn't related to future
, it also happens with a simpler example:
library(tidypolars)
mtcars |>
as_polars_df() |>
arrange(mpg)
#> Error in arrange(as_polars_df(mtcars), mpg): could not find function "arrange"
tidypolars
doesn't reexport tidyverse
functions so you need to load tidyverse
packages to use tidypolars
. This is because when I started tidypolars
I didn't necessarily want to import dplyr
and tidyr
. I changed my mind about this, so I suppose I should reexport their functions so that users only need to load tidypolars
, just like tidytable
does for instance.
@apsteinmetz It is actually expected that polars
(and therefore tidypolars
) does not work with future
and plan(multisession)
. Basically, future
creates multiple sessions to run the computation and then export the results from each session to the "main" one (from which it was called). However, future
cannot export external pointers (see this section of the docs).
Since polars
calls Rust code in the background and therefore relies on external pointers, it cannot work with future
with this type of plan. It probably shouldn't crash if this is the case, but you can also set options(future.globals.onReference = "error")
at the top of your script to abort early when future
detects external pointers in its output:
library(tidyverse)
library(furrr)
#> Loading required package: future
library(tidypolars)
plan(multisession)
### Without this, the session would crash
options(future.globals.onReference = "error")
do_stuff <- function(n){
cars <- cbind(model = row.names(mtcars),mtcars) |> as_polars_df()
cars |>
group_by(cyl) |>
summarize(mean_hp = mean(hp)) |>
arrange(cyl)
}
n <- 2
future_map(1:n, do_stuff, .options = furrr_options(packages = "tidypolars")) |>
bind_rows_polars()
#> Error: Detected a non-exportable reference ('externalptr' of class 'tidypolars') in the value (of class 'list') of the resolved future
Final update on this: from polars
0.15.0, this will now error, even if the option future.globals.onReference
is not set:
library(tidyverse)
library(furrr)
#> Loading required package: future
library(tidypolars)
plan(multisession)
options(polars.do_not_repeat_call = TRUE)
do_stuff <- function(n){
cars <- cbind(model = row.names(mtcars),mtcars) |> as_polars_df()
cars |>
summarize(mean_hp = mean(hp))
}
n <- 2
future_map(1:n, do_stuff, .options = furrr_options(packages = "tidypolars")) |>
bind_rows_polars()
#> Error: Execution halted with the following contexts
#> 0: In R: in pl$concat()
#> 1: The argument [l] caused an error
#> 2: Possibly because element no. [1]
#> 3: Expected a value of type [r_polars::lazy::dataframe::RPolarsLazyFrame]
#> 4: Got value [ExternalPtr.set_class(["tidypolars", "RPolarsDataFrame"]]
#> 5: This Polars object is not valid. Execute `rm(<object>)` to remove the object or restart the R session.
If I omit
group_by()
then it chokes onsummarize()
If we change the
future::plan()
to "sequential", which is, in effect, base R. There is no error.