NASA-LIS / LISF

Land Information System Framework
Apache License 2.0
118 stars 157 forks source link

Writing temporary files into running directory can cause conflicts #559

Open dmocko opened 4 years ago

dmocko commented 4 years ago

Numerous routines in all three sub-systems (LDT, LIS, LVT) write temporary files into the main running directory. These files are often created by system calls in the code, and are required for processing of input datasets.

However, these temporary files can cause a conflict if more than one executable is running the same directory. The two runs will create and overwrite the same temporary files, causing issues.

Proposal : Have all temporary files be written into a sub-directory within the "output" directory identified in the respective config file.

Example lis.config entry:

Output directory: ./output

The temporary files would then be written into "./output/tmp_files/" (or similar name) instead of into the main running directory. That way, multiple runs could be going at the same time, with output and temporary files only in the respective directory.

sujayvkumar commented 4 years ago

David, Yes - this cleanup is necessary. I guess we need to start with an inventory of these instances. -SK

dmocko commented 4 years ago

We can add to this list as we go. So far, I'm aware of these that could be modified:

ldt/USAFSI/USAFSI_ssmisMod.F90 ldt/DAobs/Aquarius_L2sm/readAquariusL2smObs.F90 ldt/DAobs/ASCAT_TUW/readASCATTUWsmObs.F90 ldt/DAobs/NASA_SMAPsm/readNASASMAPsmObs.F90 ldt/DAobs/SMOS_L2sm/readSMOSL2smObs.F90

lvt/datastreams/ARM/readARMObs.F90 lvt/datastreams/GLDAS1/readGLDAS1obs.F90 lvt/datastreams/GOES_LST/readGOES_LSTObs.F90 lvt/datastreams/ISMN/readISMNObs.F90 lvt/datastreams/SMAPTB/readSMAPTBobs.F90 lvt/datastreams/SMOS_L1TB/readSMOSL1TBObs.F90 lvt/datastreams/SMOS_L2sm/readSMOSL2smObs.F90

lis/dataassim/obs/NASA_SMAPvod/read_NASASMAPvod.F90 lis/dataassim/obs/SMAP_NRTsm/read_SMAPNRTsm.F90 lis/dataassim/obs/SMOS_L2sm/read_SMOSL2sm.F90 lis/routing/HYMAP2_router/runoffdata/GLDAS1data/readGLDAS1runoffdata.F90 lis/routing/HYMAP2_router/runoffdata/GLDAS2data/readGLDAS2runoffdata.F90 lis/routing/HYMAP_router/runoffdata/GLDAS1data/readGLDAS1runoffdata.F90 lis/routing/HYMAP_router/runoffdata/GLDAS2data/readGLDAS2runoffdata.F90 lis/optUE/type/paramestim/obs/ARM/read_ARMdata.F90 lis/optUE/type/paramestim/obs/ISMNsm/read_ISMNsmobs.F90

David

LIS-navari commented 1 year ago

@dmocko @emkemp @jvgeiger @sujayvkumar @cmclaug2 Hi all,

The simulation time for SMAP-E-OPL is a significant bottleneck, which requires us to divide the simulations into several jobs. As an example, for a selected Land Surface Model (LSM) covering one month (April 2015), LDT needs to process 871 SMAP_L1BTB files, taking approximately 24 wall clock hours to complete. For the time period spanning from March 31, 2015, to March 3, 2022, there are a total of 72,180 SMAP_L1BTB files. 7 yr 12 months 1 day/month = ~84 days (for each LSM) We need to run lots of LDT jobs at the same time but managing lot of directories is very difficult.
NCCS has some tools called job array (A job array represents a collection of sub jobs (also referred to as job array "tasks") which only differ by a single index parameter. Sometimes users may want to submit many similar jobs based on the same job script. ) I can use this tool to submit 28 LDT jobs simultaneously from the same directory. However, the issue arises when all 28 LDT jobs generate the same SMAP file list and read from/write to the same file list. In my case, I believe I can resolve this by modifying the SMAP-E-OPL reader in LDT to generate different file names. ldt/SMAP_E_OPL/LDT_smap_e_oplMod.F90