Open aaime opened 6 years ago
The original stats ops from JAI do compute statistics on a tile by tile basis, in a loop, so are not affected by this issue. A drawback of the fix will be that the stats computation is going to switch from parallel (using all tile scheduler threads) to linear. On a busy server doing multiple requests that's a good thing, a batch job having all the machine to use might suffer slowdowns though.
Should we add a parameter to control this behavior like, lazy VS eager computation (default to lazy)?
It's not lazy vs eager, it's parallel vs sequential, but still done all in one shot in the getProperty call. I did try to keep the parallel behavior, but due to implementation details it seems one has to also accept to keep all the tiles in memory. In particular, the tile scheduler field from the base class is not accessible, and even if we did, there is no way to see how many threads it can use (knowing it's important, otherwise we end up calling a method that builds again all tiles before returning, making the thing memory bound again).
Long story short, it seems the operation would need its own local, temporary thread pool to have some control on the computation. So the new parameter could indeed be the number of threads used to compute the stats, defaulting to one.
let's leave this open and get back to it in time.
At least we patched this for the time being.
Both simple and zonal states compute the statistics by aggregating them on tiles computation, and the property calculation calls getTiles() to make that happen. It's a concise way to compute all tiles, however, it also means all rastes are computed and retained as getTiles() returns a Raster[], in other words, the entire raster gets loaded into memory.