AllenInstitute / datacube

Other
0 stars 1 forks source link

AWS lambda backend for searches or other reduced RAM solution #85

Open chrisbarber opened 6 years ago

chrisbarber commented 6 years ago

Given that we now have some datacube data (connectivity) in zarr format, we could consider an AWS lambda backend for the various searches which currently require data to be persisted in-memory. Memory requirements (and number of instances) for the datacube could then be reduced, and the searches could be faster and/or more scalable.

Consider the connectivity dataset which is around 15GB uncompressed, and about 3.1GB (losslessly) compressed via zarr. AWS's default lambda invocation limit is 1000, so breaking a search into on-the-order-of 100 chunks would mean that each invocation would have to download ~30MB from an s3 zarr store. Number of chunks could be tuned, but given a ~350ms latency for lambda spin-up, downloading ~30MB, and then running 1/100 of the computation, this is likely to perform as well or better than the existing 4-core in-memory approach.

Integrating this into the architecture would be relatively straightfoward: