NASA-LIS / LISF

Land Information System Framework
Apache License 2.0
114 stars 153 forks source link

Replace use of 'system' call in LISF #876

Open emkemp opened 3 years ago

emkemp commented 3 years ago

All three LISF components use the non-standard 'system' subroutine to farm out work to the shell. While this usually works on Discover, calling 'system' implicitly results in calling 'fork', which is expensive in terms of memory.

I completed an audit of master (as of 5 Aug 2021) and identified the following invocations of system.

Key:

mkdir: Create directory, including parent directories if necessary ls: Create list of files based on regular expression wc: Get number of files returned from ls cp: Copy file rm: Remove file ls | xargs cat : Merge file lists together find | rm : Delete lists of files cat : Merge file lists together tail: Not clear

ldt/core/LDT_ANNMod.F90: mkdir ldt/core/LDT_climoRstProcMod.F90: mkdir ldt/core/LDT_DAmetricsMod.F90: mkdir ldt/core/LDT_fileIOMod.F90: mkdir ldt/DAobs/Aquarius_L2sm/readAquariusL2smObs.F90: ls ldt/DAobs/ASCAT_TUW/readASCATTUWsmObs.F90: ls, wc ldt/DAobs/GRACEQL_tws/GRACEQLtws_obsMod.F90: mkdir ldt/DAobs/GRACE_tws/GRACEtws_obsMod.F90: mkdir ldt/DAobs/NASA_SMAPsm/readNASASMAPsmObs.F90: ls ldt/DAobs/NASA_SMAPvod/readNASASMAPvodObs.F90: ls ldt/DAobs/simGRACE_JPL/simGRACEJPL_obsMod.F90: mkdir ldt/DAobs/SMOS_L2sm/readSMOSL2smObs.F90: ls ldt/DAobs/SMOS_NRTNN_L2sm/readSMOSNRTNNL2smObs.F90: ls ldt/DAobs/SMOS_NRTNN_L2sm/write_lookup_table.F90: mkdir ldt/USAFSI/USAFSI_amsr2Mod.F90: ls, ls | xargs cat, find | rm ldt/USAFSI/USAFSI_ssmisMod.F90: ls, ls | xargs cat, find | rm ldt/USAFSI/USAFSI_xcalgmiMod.F90: ls, ls | xargs cat, find | rm lis/core/LIS_fileIOMod.F90: mkdir lis/core/LIS_forecastMod.F90: mkdir lis/dataassim/obs/NASA_SMAPvod/read_NASASMAPvod.F90: ls, cp lis/dataassim/obs/SMAP_NRTsm/read_SMAPNRTsm.F90: cp lis/dataassim/obs/SMOS_L2sm/read_SMOSL2sm.F90: ls lis/dataassim/obs/SMOS_NRTNN_L2sm/read_SMOSNRTNNL2sm.F90: cat, rm lis/metforcing/gswp2/read_gswp2.F90: Various calls to shell scripts, which are missing; code only used if GSWP2_OPENDAP preprocessor symbol is defined lis/metforcing/usaf/readcrd_agrmet.F90: COMMENTED OUT lis/optUE/type/paramestim/obs/ARM/read_ARMdata.F90: ls, wc lis/optUE/type/paramestim/obs/ISMNsm/read_ISMNsmobs.F90: ls, wc lis/routing/HYMAP2_router/runoffdata/GLDAS1data/readGLDAS1runoffdata.F90: ls lis/routing/HYMAP2_router/runoffdata/GLDAS2data/readGLDAS2runoffdata.F90: COMMENTED OUT lis/routing/HYMAP_router/runoffdata/GLDAS1data/readGLDAS1runoffdata.F90: ls lis/routing/HYMAP_router/runoffdata/GLDAS2data/readGLDAS2runoffdata.F90: ls lis/surfacemodels/land/clm2/camclm_share/fileutils.F90: General shell wrapper, replace with execute_command_line? lis/surfacemodels/land/clm2/camclm_share/ioFileMod.F90: General shell wrapper, replace with execute_command_line? lis/surfacemodels/land/vic.4.1.1/vic411_writerst.F90: cp, tail lvt/core/LVT_DAMod.F90: mkdir lvt/core/LVT_fileIOMod.F90: mkdir lvt/core/LVT_optUEMod.F90: mkdir lvt/core/LVT_statsMod.F90: mkdir, mv lvt/core/LVT_trainingMod.F90: mkdir lvt/datastreams/ARM/readARMObs.F90: ls, wc lvt/datastreams/GLDAS1/readGLDAS1obs.F90: ls lvt/datastreams/GOES_LST/readGOES_LSTObs.F90: ls lvt/datastreams/ISMN/readISMNObs.F90: ls, wc lvt/datastreams/SMAP_L3TB/readSMAP_L3TB.F90: ls lvt/datastreams/SMAPsm/readSMAPsmobs.F90: ls lvt/datastreams/SMAPsm/SMAP_smobsMod.F90: mkdir lvt/datastreams/SMAPTB/readSMAPTBobs.F90: ls, wc lvt/datastreams/SMAPvod/readSMAPvodobs.F90: ls lvt/datastreams/SMAPvod/SMAP_vodobsMod.F90: mkdir lvt/datastreams/SMAPvwc/readSMAPvwcobs.F90: ls lvt/datastreams/SMOS_L1TB/readSMOSL1TBObs.F90: ls lvt/datastreams/SMOS_L2sm/readSMOSL2smObs.F90: ls lvt/metrics/LVT_MetricEntropyMod.F90: calls matlab, requires USE_MATLAB_SUPPORT preprocessor symbol to be defined lvt/metrics/LVT_percentileMod.F90: mkdir lvt/metrics/LVT_SRIMod.F90: mkdir

We have a replacement for mkdir (called LIS_create_subdirs.c) and ls (called create_filelist.c). The wc functionality can be easily developed. Replacements for cp and mv can also be developed.

This need is strongest for LVT and LDT, since both are still single-threaded/single-process and can run out of memory for large domains.

emkemp commented 3 years ago

A replacement for rm can be developed using the POSIX unlink C function.

emkemp commented 3 years ago

Replacing usage of 'mkdir' is the easiest, lowest-hanging fruit, and can probably be done in a couple of days.

Replacing 'ls' is also easy, since the current use of 'ls' and the replacement C function create_filelist both write filenames to text files that are subsequently opened and read in. In other words, little logic needs to be changed for the actual reading and processing of filenames by Fortran code.

Replacement of 'wc' can be done by modifying the 'create_filelist' function to return the number of files found.

Replacement of 'cp' can be done using the POSIX link function, but a wrapper function must be developed and tested.

Replacement of 'mv' can be done using the POSIX rename function, but a wrapper function must be developed and tested.

Replacement of 'cat' can be done using the POSIX 'glob' function and opening and reading each file and writing to a new one. Doable, but a wrapper function must be developed and tested.

The 'find | rm' combination can be replaced using POSIX functions 'glob' and 'unlink'. Again, a wrapper function must be written and tested.

The 'tail' usage with VIC4.1.1 is specialized for combining restart files from each MPI process into a single file, starting from the third line of each process file. A replacement would involving the POSIX 'glob' function, reading each file (ignoring the first two lines), and writing in append mode to the output file. Again, this would need to be written and tested.

LVT has code for invoking shell scripts for GSWP, but they are missing and the code is only invoked if the GSWP2_OPENDAP is defined at compile time. Since the shell scripts are missing, I would deleted this code.

LVT has code to call MATLAB for some entropy calculations. However, one invocation requires use of a MATLAB script fill1ddata.m, which is not found in LISF. I would delete this code.

The general shell wrappers used with CLM2 are ultimately used to communicate to mass storage. However, the commands 'msread' and 'mswrite' are not available on Discover, and I'm assuming this is never actually used.

emkemp commented 2 months ago

We still need to do this.