appelmar / gdalcubes

Creating and analyzing Earth observation data cubes in R
https://gdalcubes.github.io
Other
120 stars 28 forks source link

consider a compute() method to trigger lazy-evalution to execute? #74

Closed cboettig closed 1 year ago

cboettig commented 1 year ago

I love the use of lazy evaluation in gdalcubes. However there are some workflows where I think it makes sense to branch the computation, e.g. I want to generate two different analyses that will share many of the same initial processing steps in terms of creating a view, applying a mask, maybe a pixel_apply for an initial computation, after which I want to do two different further analyses. A basic example might look something like:

my_view <- raster_cube(stac_cube, view, mask) |>
  apply_pixel("log(Lai_500m)", "LAI") 

my_view |>
  animate(col = viridisLite::mako, save_as = "anim.gif", fps = 4)

# plot average over time
my_view |> 
  reduce_time(c("mean(LAI)")) |>
  plot(zlim=c(0,3), col = viridisLite::mako)  

IIUC, since my_view is lazy, this means that the calculations in the first command are re-executed to generate both the animation and the temporal average plot? Or are the methods already smart enough to somehow cache the intermediate object? You're probably familiar with the behavior of dplyr::compute() with remote databases, but maybe there's not good analog here and I should just be using something more manual like gdalcubes::write_tif() if I plan to store the intermediate computation for later use?

appelmar commented 1 year ago

Thanks a lot! You are right, there is no internal caching in this case and the first command is simply recomputed. Storing intermediate results is easiest with write_ncdf() and afterwards reading it using ncdf_cube(). I am not really familiar with dplyr::compute but to me it sounds rather difficult to find an equivalent of the remote temporary table in this case.

There is one exception to the above: If you repeatedly call plot() on an identical cube (or identical chain of operations on the same cube), there is some internal caching. This avoids recomputations if only visualization parameters are changed.

cboettig commented 1 year ago

sounds good. Yes, I noticed the caching on plot, that's actually a really nice touch. I think we can close this.