Global environment being duplicated for each core when in multicore mode. Expected behavior?

naglemi commented 4 years ago

First, thank you for making the furrr library available!

I am attempting a parallel task in which a large data object of 11GB is broken down into pieces and different pieces are analyzed by different cores. I'm working with 24 cores, but not enough memory to make 24 copies of the object. My understanding was that since I specify plan('multicore'), the cores should use shared memory rather than copying the objects. Is this expected behavior or is there something wrong? I apologize if the former and I misunderstand how the multicore mode is supposed to work. How can I utilize future_map for my task without copying the large object?

I am running CentOS v7. Below is relevant code and the resulting error showing that the system is attempting to duplicate the object rather than keeping it in shared memory.

plan("multicore", cleanup = TRUE)
RAM_GB = 4
options(future.globals.maxSize= RAM_GB*1000*(1024^2))
master_output <- future_map_dfr(.x = window_list,
                                    .f = mappable_SKAT,
                                    .options = future_options(packages = "SKAT"),
                                    .progress = TRUE,
                                    large_data = large_data)

To clarify how the above function works, the vector window_list is used to break large_data down into overlapping windows, which are tested individually.

Error in getGlobalsAndPackages(expr, envir = envir, tweak = tweakExpression,  : 
  The total size of the 9 globals that need to be exported for the future expression (‘{; ...future.f.env <- environment(...future.f); if (!is.null(...future.f.env$`~`)) {; if (is_bad_rlang_tilde(...future.f.env$`~`)) {; ...future.f.env$`~` <- base::`~`; }; ...; .out; }); }’) is 11.63 GiB. This exceeds the maximum allowed size of 3.91 GiB (option 'future.globals.maxSize'). The three largest globals are ‘future.call.arguments’ (11.63 GiB of class ‘list’), ‘...future.f’ (588.84 KiB of class ‘function’) and ‘is_bad_rlang_tilde’ (15.05 KiB of class ‘function’).
Calls: runSKATtraw ... makeFuture -> fun -> MulticoreFuture -> getGlobalsAndPackages
In addition: Warning message:

DavisVaughan commented 4 years ago

This revealed a separate bug, which I've captured in https://github.com/DavisVaughan/furrr/issues/113. You might try using future.apply::future_lapply() in the meantime to see if that fixes it.

However, I generally think what you are describing here is expected behavior. future.globals.maxSize should be checked on each element of your window_list. If any one of those individual elements is larger than future.globals.maxSize, you should get an error even on multicore. I imagine this should happen for consistency across all future backends. Does this sound right @HenrikBengtsson?

naglemi commented 4 years ago

Thank you for your response. You mention that future.globals.maxSize is checked on each element of window_list and that any one of these elements being larger than future.globals.maxSize could lead to this error. However, in my case window_list is only a few Mb and the large dataset is passed in the variable large_data while window_list only provides indices to extract from large_data. No extracted portion of large_datais larger than a few Mb either. Please let me know if this changes your assessment. Anyway, I’ll try using future.apply::future_lappy() and see if that works.

HenrikBengtsson commented 4 years ago

However, I generally think what you are describing here is expected behavior. future.globals.maxSize should be checked on each element of your window_list. If any one of those individual elements is larger than future.globals.maxSize, you should get an error even on multicore. I imagine this should happen for consistency across all future backends. Does this sound right @HenrikBengtsson?

Yes, the protection against exporting too-large objects should be per element processed, i.e. scaled by the number of elements per worker (as in https://github.com/HenrikBengtsson/future.apply/blob/develop/R/future_xapply.R#L171-L181).

FYI, it's on my not-too-far roadmap to create a 'future.{chunks,mapreduce,...)' package that will provide a common API to serve futurized map-reduce packages like future.apply, furrr, doFuture, ... That should help harmonize behaviors like this one.

DavisVaughan commented 4 years ago

Oh very nice! I'll finally be working on furrr again in the nearish future as well, so I'll keep that in mind

DavisVaughan commented 4 years ago

Nevertheless, @naglemi I don't think that changes my answer. For consistency between backends, large_data being larger than future.globals.maxSize should throw an error (that is really a future question, not a furrr one)

naglemi commented 4 years ago

Thank you, @DavisVaughan. In this case it sounds like my best option if using furrr or future.apply is to create a list containing the desired overlapping portions of large_data and use this as the x argument. I was hoping to avoid this because of the redundancy that will exist in this data structure due to overlap, but it I suppose it may not be easily avoidable.

This memory overload is also happening with the equivalent future.apply::future_lapply call, so I think I need to pre-allocate the data structure either way.

MatthieuStigler commented 4 years ago

I think I ran into something similar: having a huge initial dataset, yet each nested item would be of small size. Running future_map on each small dataset, I expected memory requirement would be small, but it would quickly use a lot of memory (seemingly multiplying the size of the huge dataset by number of cores?) .

Is this expected behavior? If not I am happy to provide a reprex trying to clarify the point?

DavisVaughan commented 4 years ago

future.globals.maxSize now scales according to the chunk size, which was the issue adjacent to this one.

I think you'll need to break up that large object so only the relevant pieces get exported to the workers. Exporting that large object is generally not a good thing to do anyways, because it is going to be extremely slow. That is part of the reason the futures global option is in place.

stemangiola commented 1 year ago

Sorry for the trivial question, but now I have some doubt?

If I have a tibble with 10 rows, and one column, with each element being 100Mb. The tibble is therefore 1Gb.

if I do

plan(multisession, workers = 10)

my_tibble |>
 mutate(new_column =  furrr::future_map(old_column, my_function))

Each worker will load 100Mb, of each worker will load 1Gb?

would .env_globals=environment() help. Does the environment mainly imports packages? Why should I need the local environment, if all my variables are locally defined and don't depend on global variables.

DavisVaughan / furrr

Global environment being duplicated for each core when in multicore mode. Expected behavior? #112