geosolutions-it / jai-ext

Java Advanced Imaging Open Source Replacement Wannabe
Apache License 2.0
90 stars 37 forks source link

Statistics operation load the entire raster in memory #187

Open aaime opened 6 years ago

aaime commented 6 years ago

Both simple and zonal states compute the statistics by aggregating them on tiles computation, and the property calculation calls getTiles() to make that happen. It's a concise way to compute all tiles, however, it also means all rastes are computed and retained as getTiles() returns a Raster[], in other words, the entire raster gets loaded into memory.

aaime commented 6 years ago

The original stats ops from JAI do compute statistics on a tile by tile basis, in a loop, so are not affected by this issue. A drawback of the fix will be that the stats computation is going to switch from parallel (using all tile scheduler threads) to linear. On a busy server doing multiple requests that's a good thing, a batch job having all the machine to use might suffer slowdowns though.

simboss commented 6 years ago

Should we add a parameter to control this behavior like, lazy VS eager computation (default to lazy)?

aaime commented 6 years ago

It's not lazy vs eager, it's parallel vs sequential, but still done all in one shot in the getProperty call. I did try to keep the parallel behavior, but due to implementation details it seems one has to also accept to keep all the tiles in memory. In particular, the tile scheduler field from the base class is not accessible, and even if we did, there is no way to see how many threads it can use (knowing it's important, otherwise we end up calling a method that builds again all tiles before returning, making the thing memory bound again).

Long story short, it seems the operation would need its own local, temporary thread pool to have some control on the computation. So the new parameter could indeed be the number of threads used to compute the stats, defaulting to one.

simboss commented 6 years ago

let's leave this open and get back to it in time.

At least we patched this for the time being.