Open ndkeen opened 1 year ago
https://github.com/E3SM-Project/E3SM/pull/5150 handled this for one case.
Specifically for the case of SMS_D_Ld1.TL319_EC30to60E2r2.DTESTM-JRA1p5
, with Apr27th master, if I download the data needed to local dir, there are 178 files totaling 350.4 GB.
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/jjra/SMS_D_Ld1.TL319_EC30to60E2r2.DTESTM-JRA1p5.pm-cpu_gnu.20230427_093602_7hlylu/inputdata
If I use a proposed different test:
SMS_D_Ld1.TL319_EC30to60E2r2.DTESTM-JRA1p5.pm-cpu_gnu.mpassi-jra_1958
and downloaded the data required from scratch, there are 119 files totaling 197.2 GB.
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/jjra/SMS_D_Ld1.TL319_EC30to60E2r2.DTESTM-JRA1p5.pm-cpu_gnu.mpassi-jra_1958.20230427_090012_is8zmt/inputdata
This is in reference to https://github.com/E3SM-Project/E3SM/pull/5639
The fixes in #5639 still allow 63 JRA.v1.5.runoff* files to be donwloaded. Each file is 3GB for a total of 192GB just for the runoff files. @jonbob all the data models are basically the same so it should be possible to add changes to drof like you did for datm in #5639.
Noting that SMS_D_Ld1.TL319_EC30to60E2r2.DTESTM-JRA1p5.pm-cpu_gnu.mpassi-jra_1958
on current next still downloads large amount of data (same as above -- 119 files totaling 197.2 GB).
I'm also trying to download all of the data needed for SMS_D_Ln3.TL319_EC30to60E2r2_wQU225EC30to60E2r2.GMPAS-JRA1p5-WW3
which is taking a while and will also be a large amount -- perhaps some of the same data.
These will not be able to run on machines with limited space, such as GCP.
@rljacob - yes, it should be possible. Do we want a generic data model setting for start_ and stop_year, or just add something like is in datm to drof?
PR https://github.com/E3SM-Project/E3SM/pull/5670 just made a change that might have reduced the number of required downloaded files (again) -- update, just realized my test has the nersc portal change only and does NOT contain the changes in the PR noted. So will retest -- ok now I only see 57 files downloaded and 22.4 GB.
After https://github.com/E3SM-Project/E3SM/pull/5670, there are still (at least) 2 tests that will download a lot of data.
One is SMS_D_Ln3.TL319_EC30to60E2r2_wQU225EC30to60E2r2.GMPAS-JRA1p5-WW3.ww3-jra_2004
which may just need same changes as the mentioned PR.
And the other is ERS.hcru_hcru.I20TRGSWCNPRDCTCBC.gcp12_gnu.elm-erosion
Certainly any machine running climate simulations will need a fair amount of space for the input data as well as the output. For all machines, we have a specified location for inputdata where any data needed for a case is downloaded and then can be used by multiple users. The total amount of data collected into inputdata slowly grows over time -- we rarely delete data that is no longer used. At NERSC, the total space used in
/global/cfs/cdirs/e3sm/inputdata
is currently 77.8 TB (68 TB inatm
). Sometimes there are situations where it would be beneficial to be more careful about what data is downloaded for a given case. One such situation are trying to maintain testing on machines where there is limited disk space for this data. For machines that are mostly (or only) used for testing, a minimum set of inputdata would be better (or required). And we are currently having an issue now using GCP (google cloud platform) cluster that has a 2TB disk (for all use, shared across all users) and disk space is rented at a premium.For any case, we can change the location of inputdata with
xmlchange DIN_LOC_ROOT=/newi
where the new location would be populated with data required for each case (the data is downloaded via wget from blues server at ANL). For a few common test suites, I made this change on pm-cpu as a test to see how much data is actually required to run the tests (as currently defined).Now, of course, running
e3sm_integration
ande3sm_extra_coverage
with sameinputdata
location would be a savings as there would be files needed in both that would only need to be downloaded once.And there are a few stand-outs -- ie tests that download a significant portion of the data. They are all cases that want forcing data and while the test may only run a few days, multiple years of forcing data is downloaded.
In extra_coverage suite, the following directory is 1087GB which is 80% of the total needed.
And for the other suites, these forcing data dirs:
If it's easy and practical, would be great if a case could be smarter about what data it will actually need. I think this is good in general, but could specifically help on machines with limited disk space. For now, I'm going to try freeing up space on GCP inputdata that is no longer being used by the current set of test cases, but wanted to document the issue.
I also have the required data download per test case. For example, the test
ERS.hcru_hcru.I20TRGSWCNPRDCTCBC.pm-cpu_gnu.elm-erosion
is downloading 445 GB alone. Note during this testing, I found a few cases that are unable to download all the data they need without help. Several cases need data hereatm/cam/chem/trop_mozart_aero/emis/DECK_ne30
which is not automatically downloaded. The total space of this dir is only 33GB. Also, any test using MPAS that requires 2 cases (ie ERS, PET, etc) which has a case2run that needs a different PE layout will fail at runtime -- the test case is not realizing it may need different MPAS partition files for the case2run.