geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
58 stars 22 forks source link

Troubleshooting Memory errors #537

Open falkamelung opened 9 months ago

falkamelung commented 9 months ago

Occasionally you may get an Out of Memory error in both minsar (rsmas_insar) and miaplpy. This happens because the estimation of the required memory is not correct (the memory requirements for each run_step are indefaults/job_defaults.cfg). The job_submission.py script uses this file to estimate how many jobs can be simultaneously run on one node and creates the run_files accordingly. In this example it thinks that there is enough memory to run 15 jobs on this node:

wc -l run_05_miaplpy_unwrap_ifgram_0
15 run_05_miaplpy_unwrap_ifgram_0

If you can't reduce the memory requirement by changing parameters in the *template file (e.g. more looks for isce processing) or a smaller miaplpy.subset area for MiaplPy, recreate the job files using a higher value for numMemoryUnits:

job_submission.py --template $TE/MiamiTsxSMDT36.template run_05_miaplpy_unwrap_ifgram --outdir run_files --numMemoryUnits 2 --writeonly

The original job_submission.py command is in the log file for minsar and displayed to the screen by miaplpyApp.py --jobfiles.

You can check memory usage on the compute node using free -h.