CONABIO / antares3

Madmex with open data cube and in python3
2 stars 2 forks source link

Upload/download netcdf 's intermediary results to S3 #45

Open palmoreck opened 6 years ago

palmoreck commented 6 years ago

Pipeline of classification writes intermediary results to EFS, for instance:

/shared_volume/datacube/datacube_ingest
/shared_volume/datacube/serialized_objects

It would be better (regarding costs of EFS) that EFS purpose is just to store small to medium files to share between instances and docker containers. Each instance should have it's own volume to write their results and finally upload each of them to S3, this imply that in a next step ahead of pipeline of classification each instance would need to download from S3 to it's own volume for reading. This would avoid bottle necks of writing/reading to/from EFS and thereby benefit to scale medium-big clusters and finally save costs (S3 is cheaper than EFS).

Observe that nothing supports reading netcdf from S3 using opendatacube

palmoreck commented 6 years ago

Another option is:

dc.save feature

see:

https://github.com/opendatacube/datacube-core/issues/467