ZGIS / semantique

Semantic Querying in Earth Observation Data Cubes
https://zgis.github.io/semantique/
Apache License 2.0
16 stars 6 forks source link

feat: Preview mode & Caching :gift: #40

Closed fkroeber closed 5 months ago

fkroeber commented 6 months ago

Description

This PR introduces caching functionality for data layers. The necessity arises in particular from the new STACCube (#38) to reduce time-consuming data fetching via internet connections when repeatedly referencing the same data layer (e.g. SCL layer of Sentinel-2 data by several entities such as water, vegetation and clouds). Assuming that these and other cases are characterised by a situation where data loading is more time-consuming than the actual data manipulation during recipe execution, introducing an optional parameter for caching seems reasonable.

The problem with an elegant implementation of caching is that references to the data layers must first be resolved in order to know which data layers are to be cached at all (namely only those that are used again in the recipe evaluation). This requires a kind of dry run of the recipe. The current PR implements this via a preview run, which executes the recipe with greatly reduced resolution (-> also useful for other purposes apart from caching).

I am attaching a test notebook to the PR, in which the caching is described in more detail. In my tests so far, caching has accelerated performance, especially when retrieving remotely stored data via STAC. Conversely, caching locally stored data often means an unnecessary computational overhead. Feel free to evaluate the benefits of caching for your own use cases and suggest improvements if necessary. A possible alternative to the current caching would be, for example, the implementation of a simpler naive caching without pre-caching, in which each data layer is simply cached as a precaution. This increases the memory footprint but would make the need for a preview run obsolete.

Type of change

Select one or more relevant options:

Checklist: