geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
59 stars 23 forks source link

launcher segmentation fault and Bus error #486

Open falkamelung opened 3 years ago

falkamelung commented 3 years ago

Occasionally I get a seg fault. Re-running works fine. Should we just rerunning when this happens?

more run_07_merge_reference_secondary_slc_10_7820891.e run_07_merge_reference_secondary_slc_10_7821175.e
::::::::::::::
run_07_merge_reference_secondary_slc_10_7820891.e
::::::::::::::
/tmp/rsmas_insar/3rdparty/launcher/launcher: line 93: 282820 Segmentation fault      SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190311 > /scratch/05861/tg851601/KokoxiliChunk38Sen
DT150/run_files/run_07_merge_reference_secondary_slc_10_20190311_$LAUNCHER_JID.o 2> /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190311_$LAUNCHER_JID.e
::::::::::::::
run_07_merge_reference_secondary_slc_10_7821175.e
::::::::::::::
using /tmp/launcher.7821175.hostlist.Extp5DXu to get hosts
starting job on c506-032

more run_07_merge_reference_secondary_slc_10_7820891.o


Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rootvg01-tmp  144G   57M  144G   1% /tmp
sourcing ~/accounts/platforms_defaults.bash ...
sourcing /tmp/rsmas_insar/setup/environment.bash ...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/stack to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/stack has been copied to /tmp/stack on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/reference to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/reference has been copied to /tmp/reference on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20181229 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20181229 has been copied to /tmp/20181229 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190110 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190110 has been copied to /tmp/20190110 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190122 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190122 has been copied to /tmp/20190122 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190203 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190203 has been copied to /tmp/20190203 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190215 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190215 has been copied to /tmp/20190215 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190227 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190227 has been copied to /tmp/20190227 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190311 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190311 has been copied to /tmp/20190311 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190323 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190323 has been copied to /tmp/20190323 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190404 to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/coreg_secondarys/20190404 has been copied to /tmp/20190404 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.
After copy-to-tmp: Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootvg01-tmp 144G 137G 7.4G 95% /tmp
Launcher: Setup complete.

------------- SUMMARY ---------------
   Number of hosts:    1
   Working directory:  /dev/shm
   Processes per host: 9
   Total processes:    9
   Total jobs:         9
   Scheduling method:  block

-------------------------------------
Launcher: Starting parallel tasks...
Launcher: Task 7 running job 8 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190323 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190323_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190323_$LAUNCHER_JID.e)
Launcher: Task 6 running job 7 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190311 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190311_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190311_$LAUNCHER_JID.e)
Launcher: Task 4 running job 5 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190215 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190215_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190215_$LAUNCHER_JID.e)
Launcher: Task 3 running job 4 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190203 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190203_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190203_$LAUNCHER_JID.e)
Launcher: Job 7 completed in 4 seconds.
Launcher: Task 6 done. Exiting.
Launcher: Task 0 running job 1 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20181229 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20181229_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20181229_$LAUNCHER_JID.e)
Launcher: Task 8 running job 9 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190404 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190404_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190404_$LAUNCHER_JID.e)
Launcher: Job 5 completed in 7 seconds.
Launcher: Job 4 completed in 5 seconds.
Launcher: Task 4 done. Exiting.
Launcher: Task 3 done. Exiting.
Launcher: Job 8 completed in 11 seconds.
Launcher: Task 7 done. Exiting.
Launcher: Job 1 completed in 4 seconds.
Launcher: Task 0 done. Exiting.
Launcher: Task 1 running job 2 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190110 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190110_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190110_$LAUNCHER_JID.e)
Launcher: Job 9 completed in 4 seconds.
Launcher: Task 8 done. Exiting.
Launcher: Task 2 running job 3 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190122 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190122_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190122_$LAUNCHER_JID.e)
Launcher: Task 5 running job 6 on c497-004.stampede2.tacc.utexas.edu (SentinelWrapper.py -c /scratch/05861/tg851601/KokoxiliChunk38SenDT150/configs/config_merge_20190227 > /scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
/run_07_merge_reference_secondary_slc_10_20190227_$LAUNCHER_JID.o 2>/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files/run_07_merge_reference_secondary_slc_10_20190227_$LAUNCHER_JID.e)
Launcher: Job 2 completed in 4 seconds.
Launcher: Task 1 done. Exiting.
Launcher: Job 3 completed in 4 seconds.
Launcher: Task 2 done. Exiting.
Launcher: Job 6 completed in 4 seconds.
Launcher: Task 5 done. Exiting.
Launcher: Done. Job exited without errors

head -10 run_07_merge_reference_secondary_slc_10_7820891.o run_07_merge_reference_secondary_slc_10_7821175.o
==> run_07_merge_reference_secondary_slc_10_7820891.o <==
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rootvg01-tmp  144G   57M  144G   1% /tmp
sourcing ~/accounts/platforms_defaults.bash ...
sourcing /tmp/rsmas_insar/setup/environment.bash ...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/stack to /tmp
Running distribution functions now...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/stack has been copied to /tmp/stack on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!
Done.

==> run_07_merge_reference_secondary_slc_10_7821175.o <==
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rootvg01-tmp  144G   48M  144G   1% /tmp
sourcing ~/accounts/platforms_defaults.bash ...
sourcing /tmp/rsmas_insar/setup/environment.bash ...
/scratch/05861/tg851601/KokoxiliChunk38SenDT150/run_files
Going to distribute directory /scratch/05861/tg851601/KokoxiliChunk38SenDT150/stack to /tmp
Running distribution functions now...

run_07_merge_reference_secondary_slc_10_7820891.o:After copy-to-tmp: Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootvg01-tmp 144G 137G 7.4G 95% /tmp
run_07_merge_reference_secondary_slc_10_7821175.o:After copy-to-tmp: Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootvg01-tmp 144G 137G 7.4G 95% /tmp
falkamelung commented 2 years ago

Oct 30 Launcher bus error (frontera) MakranChunk26SenDT151

cat run_09_merge_burst_igram_13_3714574.e
/tmp/rsmas_insar/3rdparty/launcher/launcher: line 97: 285643 Bus error               SentinelWrapper.py -c /scratch1/05861/tg851601/MakranChunk26SenDT151/configs_tmp/config_merge_igram_20180422_20180528 > /scratch1/05861/tg851601/MakranChunk26SenDT151/run_files_tmp/run_09_merge_burst_igram_13_20180422_20180528_$LAUNCHER_JID.o 2> /scratch1/05861/tg851601/MakranChunk26SenDT151/run_files_tmp/run_09_merge_burst_igram_13_20180422_20180528_$LAUNCHER_JID.e

MakranChunk27SenDT151

cat run_09_merge_burst_igram_5_3715046.e
/tmp/rsmas_insar/3rdparty/launcher/launcher: line 97: 62837 Bus error               SentinelWrapper.py -c /scratch1/05861/tg851601/MakranChunk27SenDT151/configs_tmp/config_merge_igram_20160607_20160818 > /scratch1/05861/tg851601/MakranChunk27SenDT151/run_files_tmp/run_09_merge_burst_igram_5_20160607_20160818_$LAUNCHER_JID.o 2> /scratch1/05861/tg851601/MakranChunk27SenDT151/run_files_tmp/run_09_merge_burst_igram_5_20160607_20160818_$LAUNCHER_JID.e

MakranChunk28SenDT151

cat run_08_generate_burst_igram_72_3714913.e
/tmp/rsmas_insar/3rdparty/launcher/launcher: line 97: 57121 Bus error               SentinelWrapper.py -c /scratch1/05861/tg851601/MakranChunk28SenDT151/configs_tmp/config_generate_igram_20210301_20210313 > /scratch1/05861/tg851601/MakranChunk28SenDT151/run_files_tmp/run_08_generate_burst_igram_72_20210301_20210313_$LAUNCHER_JID.o 2> /scratch1/05861/tg851601/MakranChunk28SenDT151/run_files_tmp/run_08_generate_burst_igram_72_20210301_20210313_$LAUNCHER_JID.e