futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

future_lapply() aware of objects not listed in future.globals #53

Closed eszterb99 closed 4 years ago

eszterb99 commented 4 years ago

This is rather a question, not a bug.

I use future_lapply() to go through a large database and perform different calculations on each observation. In the future.globals argument, I have to provide a long list of objects but adding also the database makes the parallel calculation so slow it's not worth replacing the sequential solution.

However, I tried to run future_lapply() with providing everything in the future.globals argument except for the database that the calculations should be performed on - and it still works to my surprise. Here is a small example demonstrating it:

library(future.apply)

# list of objects
data <- mtcars
other_obj <- 1:1000
other_obj2 <- iris

plan(multiprocess)

# provide every object in the future.globals argument except for the one that is actually needed
# but still works
future_lapply(data, mean, future.globals = c('other_obj', 'other_obj2'))

Actually, I'm pretty happy with that working like this (makes running 7-10x faster), however, it would be nice to know how future_lapply() finds objects if they are not provided in the future.globals argument.

Thank you!

HenrikBengtsson commented 4 years ago

Turn on options(future.debug=TRUE) and look at the long output.

HenrikBengtsson commented 4 years ago

Note also that argument X (here data) is always considered a global but it will be chunked up separately for each worker.

HenrikBengtsson commented 4 years ago

Haven't heard back. I'm closing, but please let me know if you still have questions or this was not clear.