geoschem / geos-chem-cloud

Run GEOS-Chem easily on AWS cloud
http://cloud.geos-chem.org
MIT License
39 stars 9 forks source link

[DISCUSSION] Download extra input data from Compute Canada to EC2 #30

Closed JiaweiZhuang closed 1 year ago

JiaweiZhuang commented 5 years ago

Overview

Currently our S3 bucket only syncs the data on Harvard Odyssey, but not the extra data on Compute Canada, notably global high-res metfields. But users can still download data from Compute Canada to EC2 (pulling data into the cloud is free). We should see how fast this is. If it is too slow we should consider adding those data to S3.

The speed can highly depend the AWS region -- I believe that the Canada region would be very fast.

Action items

JiaweiZhuang commented 5 years ago

I got a bandwidth of 10~20 MB/s in both us-east-1 and ca-central-1. Launching EC2 in Canada region does not speed up the download, so 20 MB/s is probably the bandwidth limit enforced by Compute Canada. This is 10x slower than downloading from S3 to EC2.

I used this command to download 1-month of GEOS-FP 0.25x0.3125 metfields:

wget -r -nH --cut-dirs=2 "http://geoschemdata.computecanada.ca/GEOS_0.25x0.3125/GEOS_FP/2017/07/"

1-month of global 0.25x0.3125 metfields are ~300 GB. It would take 4 hours to download from Compute Canada and 30 minutes from S3.

JiaweiZhuang commented 5 years ago

We will consider adding those extra data to S3 based on user request. Please leave a comment here if you want to see a specific directory available in S3. You can browse all the files at http://geoschemdata.computecanada.ca/

yantosca commented 1 year ago

Closing this issue as the GEOS-Chem data is now at WashU (http://geoschemdata.wustl.edu).