geoschem / geos-chem-cloud

Run GEOS-Chem easily on AWS cloud
http://cloud.geos-chem.org
MIT License
39 stars 9 forks source link

Tutorial on real research workflow (long-term simulations with large data) #4

Closed JiaweiZhuang closed 5 years ago

JiaweiZhuang commented 6 years ago

After #2 is done, we will be able to run any types of GEOS-Chem simulations over any periods. We can go beyond proof-of-concept runs and do serious research projects.

Preliminary workflow design:

  1. Use small instances (r4.large) for model configuration and testing.
  2. Switch to more powerful instances (>c5.4xlarge) to perform long-term simulations. Use spot instances to bring down the computing cost.
  3. For large output data, move to S3 to bring down the storage cost (also see #3).
  4. Use small instances (c5.large) for data analysis.

The working directory should probably be in a standalone EBS volume, so can it can be quickly shared between instances and will not be affected by spot instance termination. The root volume should only have software libraries and model source code.

JiaweiZhuang commented 6 years ago

Example workflow now posted:

http://cloud-gc.readthedocs.io/en/latest/chapter02_beginner-tutorial/research-workflow.html

JiaweiZhuang commented 6 years ago

Re-open this issue since we still need to investigate what's the "best workflow" on the cloud. This could depend on specific research projects, including how long the simulation takes, how many simulations need to run, etc.

It would be nice to have a "sample workflow" for a serious project, so at least everyone has a bottomline to follow.