earthnet2021 / earthnet-minicuber

EarthNet Minicuber
MIT License
13 stars 4 forks source link

en22 production #4

Closed vitusbenson closed 1 year ago

vitusbenson commented 2 years ago

TODO for getting en22 done.

en22 Production pipeline

en22 Orchestrator

This is a piece of software that handles

en-minicuber

Handles download of 1 minicube....

workflow

  1. Orchestrator defines the locations & split of en22
  2. We run the cloud mask generation locally for all of them + save the resulting netcdf files into our nextcloud (or to AWS S3...)
  3. Upload ERA5, Geomorphons, Soilgrids to AWS S3
  4. On AWS: run minicube generation per minicube in parallel for many. Each generation gets the AWS data + merges with our cloud mask netcdf + saves into radientmlhub s3 bucket.

Running on AWS - Steps

AWS Cost Estimate

Can run 1 train minicube on t3.medium, but close to RAM limit (3.6/4GB) -> so to be safe: t3.large (8GB)
Takes ~10-15min/minicube on US-West2
or ~8-10min/minicube on AF-South1 -> ~1750Compute hours per 10k train minicubes
-> 0.0542$/h for t3.medium (4GB).. not suuper safe, but should be ok.. -> ~100$/10k train minicubes 1 minicube = ca. 20MB disk space -> 10k minicube = ca. 200GB -> outgoing data transfer 0.02$/GB -> 4$/10k train minicubes 1 test minicube = 3.5 train minicubes Overall: -> 100$/10k train minicubes -> 200$ for 5k test minicubes -> 1000$-1500$ in total??

TODO