Given that we now have some datacube data (connectivity) in zarr format, we could consider an AWS lambda backend for the various searches which currently require data to be persisted in-memory. Memory requirements (and number of instances) for the datacube could then be reduced, and the searches could be faster and/or more scalable.
Consider the connectivity dataset which is around 15GB uncompressed, and about 3.1GB (losslessly) compressed via zarr. AWS's default lambda invocation limit is 1000, so breaking a search into on-the-order-of 100 chunks would mean that each invocation would have to download ~30MB from an s3 zarr store. Number of chunks could be tuned, but given a ~350ms latency for lambda spin-up, downloading ~30MB, and then running 1/100 of the computation, this is likely to perform as well or better than the existing 4-core in-memory approach.
Integrating this into the architecture would be relatively straightfoward:
The searches already have been swapped between different backends (in this case numpy and dask) depending on the parallelization and data representation strategy being used, and automatically choosing a backend based on the particular datacube would not complicate the code base.
Convenient xarray-aware chunking is already implemented.
Use of zarr is working in the current release, along with convenience functions for copying between stores.
xarray and zarr are supporting S3 through a simple interface. dask can also work with a zarr store that is located in an s3 or any other store.
Datacubes are not particularly large and have compressed well with zarr (connectivity from 15GB uncompressed to 3.1GB compressed), and lambda invocation limits are fairly high, allowing a high chunking factor and a small download per-invocation.
Invoking lambda functions from python is easy, and most arguments are small or easily representable in json (e.g. seed index, chunk indexes, cache key prefix, etc)
Chunked calculation of masks/filters has been tested in a multithreaded environment, where the filter expression is evaluated once in a single thread, performing all calculations lazily, to produce a dask array which contains everything needed to compute it. Dask supports pickling of these dask arrays, which it uses for its multiprocess and distributed execution environments. The datacube parallelize decorator will also lazily apply chunking to any dask arrays it is given. The mask evaluation code can therefore be kept local for simplicity, while the lambda caller will pass a pickled mask chunk to each invocation, to be un-pickled and evaluated using the dask local (threaded) scheduler. Lambdas will therefore depend on dask but only for local computation, and will not have to contain any of the masking code.
Alternatively, if some masks are small they can be computed on the client and sent to each lambda as an argument.
Datacube already has caching which can prevent unnecessary invocations. #77 is a further improvement on this for searches.
A list of expected dependencies for the lambda functions: numpy, xarray, dask, zarr, redis (optional)
[optional] Use of the cache within lambdas could easily be disabled, but in a future architecture caching could be moved to AWS (e.g. ElastiCache redis) and used from lambdas. This would be an improvement over the current architecture where each datacube instance has its own local redis instance, as well as silos having their own separate caches. This could be separate from the cache used by the filtering logic, whose cache could remain local.
Given that we now have some datacube data (connectivity) in zarr format, we could consider an AWS lambda backend for the various searches which currently require data to be persisted in-memory. Memory requirements (and number of instances) for the datacube could then be reduced, and the searches could be faster and/or more scalable.
Consider the connectivity dataset which is around 15GB uncompressed, and about 3.1GB (losslessly) compressed via zarr. AWS's default lambda invocation limit is 1000, so breaking a search into on-the-order-of 100 chunks would mean that each invocation would have to download ~30MB from an s3 zarr store. Number of chunks could be tuned, but given a ~350ms latency for lambda spin-up, downloading ~30MB, and then running 1/100 of the computation, this is likely to perform as well or better than the existing 4-core in-memory approach.
Integrating this into the architecture would be relatively straightfoward: