LandSciTech / pfocal

Fast parallel convolution in R.
https://landscitech.github.io/pfocal/
Other
2 stars 1 forks source link

Memory error when using pfocal on a large raster such as the ROF #28

Open see24 opened 2 years ago

see24 commented 2 years ago

I get this error message when I use the caribouMetrics package with pfocal for the ROF dataset

Error: memory exhausted (limit reached?)

I assume this is because pfocal converts the raster to a matrix before passing it to the C++ code. The raster package has a way to handle this using canProcessInMemory to decide when to process the raster in blocks. It is a bit complicated for focal because you need overlapping blocks but raster::focal does it like this: https://github.com/rspatial/raster/blob/6860faa17fd2f1e8b8b54b2a1bf5074930ff3795/R/focal.R#L125

I have found before that canProcessInMemory is quite conservative about what can be processed by default so you would want to be sure it was only triggered when actually necessary.

Would it be worth while to have something similar for pfocal?

VLucet commented 2 years ago

For now, we could just allow canProcessInMemory to be passed to pfocal as an extra argument for the user to decide.

see24 commented 2 years ago

For the user to decide what? What would happen if canProcessInMemory = FALSE was supplied by the user?

VLucet commented 2 years ago

Well, I didn't word my answer very well I apologize. We would create an argument canProcessInMemory which, if set to false, would replicate the behavior that happens in raster::canProcessInMemory(). We'd need to figure out how to process chunks of data instead of giving the whole raster to the C backend.

see24 commented 2 years ago

Ya I that would be good. It is a bit tricky but the link above shows how the raster::focal function goes about splitting the data into chunks

xlirate commented 2 years ago

Sadly, the current implementation cannot handle chunks like this on its own. My recommendation is to chop the dateset up into chunks that are small enough to be worked on one at a time up in the R code.

A future implementation may include implementing virtual memory using a temporary file and only loading a few rows of the dateset at a time. We should keep this issue open until then

see24 commented 2 years ago

Sounds good. For now the easiest work around is just to use terra::focal which is faster than the old raster::focal and does not load the raster in memory. For the future implementation it might be worth looking at how terra handles large rasters since I think they do something like the virtual memory you mention.