Open twest820 opened 1 year ago
I'm not sure how big simpleFeatureCollection
is but one thing to keep in mind with your current approach is that you (probably) get 16 copies of it, one for each worker, and that could be expensive
Are you sure mediumSizeRaster = rast("twoGBraster.tif")
is actually resulting in an object? It looks like it uses a relative path so the working directory on the worker may be different. You could try supplying an absolute path instead.
If the error message for workers > 1
is to be believed, it appears somehow the call to rast()
is getting skipped—even if the statement was executed and rast()
had some silent error leading it to somehow return NULL
instead of failing properly that should still result in the parser adding mediumSizeRaster
as a workspace variable. So it seems like something might be going pretty badly wrong though, given future's limitations for flowing diagnostics from workers back to their caller, we might be stuck. (I find myself often wishing for plan(multithread)
but that's not on furrr.)
If it was a pathing issue, which it presumably isn't since there's no issue with workers = 1
, I'd expect to see something like the usual
Error: [rast] file does not exist: twoGBraster.tif
In addition: Warning message:
twoGBraster.tif: No such file or directory (GDAL error 4)
come back. But future may not be able to route that.
I'm not sure how big
simpleFeatureCollection
is
Good question! It's only a couple MB, so negligible in this context—32 workers would be better but even 8 GB per worker is maybe asking too much (if this approach to the task had worked I was prepared to kill the future_map()
and try with eight workers to get 16 GB DDR per worker if physical memory was going to be exceeded).
A repex isn't feasible because multiple gigabytes (actually multiple terabytes in the full use case) of data are involved but I have the following scenario
which fails with
Same code runs fine with
workers = 1
. While this approach isn't ideal (it would likely waste 60+ GB of memory in duplicate copies of a raster which is thread safe since it sees only read access), the preferred implementation of hoistingrast()
out of the function body fails with #258. Since I've got 128 GB of DDR and can afford to waste some is there a way to getrast()
to construct an object under parallel execution?From what I can see at the moment, the least undesirable workaround appears to be refactor the code for single threaded execution, manually chunk and balance the polygons, and then kick off 16 background jobs in RStudio using Code -> Run selection as background job. But, insofar as I understand furrr, that's the sort of task
future_map()
exists to automate.