Closed scriptator closed 8 months ago
I created an issue now on the Geotools/Geoserver Jira as well because we will need them as well for a fix: https://osgeo-org.atlassian.net/browse/GEOS-11177
Here is some feedback, mind, I can only dedicate a few minutes so mind I could not think deeply about the issue.
The mechanism is proposed seems fragile, as it's based on some convention:
This will work for single GeoTIFF files in GeoServer, because the reader is kept open and thus the header can be re-used, but it gets cleared on store removal. That works for a small set of files, as each file maps to a single layer (and an OGC service should not have too many of them).
When the set of files grows (think, handling all Sentinel2 scenes collected so far in a single GeoServer, pointing at S3 storage), single files do not work, and mosaic needs to be used. The image mosaic uses the low-level imageio-ext reader is used directly, without going through the GeoTools high-level one (the one you patched in your repo)... which would break the cleanup. We cannot rely on the low level reader to clean up, as the low-level reader is created and destroyed for each request. I guess a "mosaic-wide" clean mechanism would be needed in that case, to ditch caches for all files that the mosaic has opened so far? I cannot dedicate enough time to study this case, but it seems we'd have to accumulate all the resource ids used by the mosaic in its lifetime.
Side issue, the listeners would accumulate in their own cache, which may cause mis-alignments:
Seems like the two would have to live in the same cached object, to force a proper alignment.
Taking a step back, the idea of using a SoftCache for the cache was meant to be a self-managing cache, without the need to accumulate per-source listeners. In your case that's not working because the referenced contents of the COG are changing on GeoServer's back, creating the need for a single file cleanup mechanism instead. Thinking out loud, couldn't you create a new COG, at a different position, and update the GeoServer store URL instead?
An alternative that I can think of, would be to have a direct API to manage the cache, exposed by GeoServer for the admin to clean the caches for a specific reader, rather than trying to rely on the datastore disposals, which as written above, seem like it would work only for single tiff stores. Try to think, what would you do if the TIFF files you're managing were used in a mosaic instead?
Thanks for the detailed feedback @aaime!
The image mosaic uses the low-level imageio-ext reader is used directly, without going through the GeoTools high-level one (the one you patched in your repo)... which would break the cleanup.
I never used the image mosaics feature so far and therefore didn't foresee that my solution would not work in that case.
Seems like the two would have to live in the same cached object, to force a proper alignment.
That is a very good point as well which is unfortunately not easy to implement.
Thinking out loud, couldn't you create a new COG, at a different position, and update the GeoServer store URL instead?
Thx for the suggestion. This is exactly the fallback solution that we implemented in the meantime in order to quickfix the issue on our side. I hoped to be able to contribute a proper fix for the underlying problem in the Geoserver but it appears to be not so easy. Therefore I guess we can close this PR. I'll keep the issue in the geoserver open nevertheless.
The current implementation of the
AbstractRangeReader
in the COG plugin does not provide any means of cleaning u cached COG headers after they are not used any more. The only means of clearing the cache is through thereset
method ofExtCaches
, which cleans all COG headers (besides other resources from other classes).This functionality is relevant for me in the following use cases:
The COG headers accumulate in memory and lead to a waste of resources if there are many COGs which are not frequently used.
I need to update COGs on the S3 storage, because my application creates the COGs based on user-defined rules. After the user changes a COG, I want to show them the updated result in a map view (via the Geoserver). Removing the coveragestore on the Geoserver currently does not clean up the COG header cache, which leads to errors after re-creating the coveragestore because the cached headers do not belong to the real content of the COG any more.
Due to these problems I drafted a solution, where the
AbstractRangeReader
creates a separate listener in theExtCache
for every COG that is serves and users of imageio-ext can call a reset function for a specific COG on theExtCache
.I intend to use this new feature in the geotools
GeoTiffReader
as follows: https://github.com/scriptator/geotools/commit/dbe2d61b5ea75d3805a6691967e9895864f37c17So far I tested the changes manually for my use case but nothing more than that. Please give me some feedback on the idea.