geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
58 stars 22 forks source link

run_workflow.bash: errors not reported to terminal (only to `workflow.*.log`) #503

Open falkamelung opened 2 years ago

falkamelung commented 2 years ago

Hi @Ovec8hkin When check_job_outputs.py detects an error, run_workflow.bash exits properly, but it does not show any error message. It shows:

run_workflow.bash /scratch/05861/tg851601/unittestGalapagosSenDT128 --start 5
Started at: 2021-09-26 13:18:00
Jobfiles to run:
/scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job
---------------------------------------------------------------------------------------------------------------------------------------------------------
|                      |       | Step      | Total     | Step      |         |                                                                        |  
|                      | Extra | active    | active    | processed | Active  |                                                                        |  
| File Name            | tasks | tasks     | tasks     | jobs      | jobs    | Message                                                                |  
---------------------------------------------------------------------------------------------------------------------------------------------------------
| run_05_ful...e_0.job | 6     | 0/400     | 0/500     | 1/1       | 0/1     | Submitted: 8509633                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------
Jobs submitted: 8509633
unittestGalapagosSenDT128, run_05_fullBurst_resample, 1 jobs : 1 COMPLETED , 0 RUNNING , 0 PENDING , 0 WAITING   .
check_job_outputs.py  /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job
This is the Open Source version of ISCE.
Some of the workflows depend on a separate licensed package.
To obtain the licensed package, please make a request for ISCE
through the website: https://download.jpl.nasa.gov/ops/request/index.cfm.
Alternatively, if you are a member, or can become a member of WinSAR
you may be able to obtain access to a version of the licensed sofware at
https://winsar.unavco.org/software/isce
checking *.e, *.o from /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job
no error found

Jobfiles to run:
/scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0.job
---------------------------------------------------------------------------------------------------------------------------------------------------------
|                      |       | Step      | Total     | Step      |         |                                                                        |  
|                      | Extra | active    | active    | processed | Active  |                                                                        |  
| File Name            | tasks | tasks     | tasks     | jobs      | jobs    | Message                                                                |  
---------------------------------------------------------------------------------------------------------------------------------------------------------
| run_06_ext...n_0.job | 1     | 0/400     | 6/500     | 1/1       | 0/1     | Submitted: 8509649                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------
Jobs submitted: 8509649
unittestGalapagosSenDT128, run_06_extract_stack_valid_region, 1 jobs : 1 COMPLETED , 0 RUNNING , 0 PENDING , 0 WAITING   .
check_job_outputs.py  /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0.job

On the other hand, the workflow*.log shows all the error information:

cat workflow.2.log
Started at: 2021-09-26 13:18:00
Jobfiles to run:
/scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job
---------------------------------------------------------------------------------------------------------------------------------------------------------
|                      |       | Step      | Total     | Step      |         |                                                                        |  
|                      | Extra | active    | active    | processed | Active  |                                                                        |  
| File Name            | tasks | tasks     | tasks     | jobs      | jobs    | Message                                                                |  
---------------------------------------------------------------------------------------------------------------------------------------------------------
| run_05_ful...e_0.job | 6     | 0/400     | 0/500     | 1/1       | 0/1     | Submitted: 8509633                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------
Jobs submitted: 8509633
unittestGalapagosSenDT128, run_05_fullBurst_resample, 1 jobs : 1 COMPLETED , 0 RUNNING , 0 PENDING , 0 WAITING   .
check_job_outputs.py  /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job
This is the Open Source version of ISCE.
Some of the workflows depend on a separate licensed package.
To obtain the licensed package, please make a request for ISCE
through the website: https://download.jpl.nasa.gov/ops/request/index.cfm.
Alternatively, if you are a member, or can become a member of WinSAR
you may be able to obtain access to a version of the licensed sofware at
https://winsar.unavco.org/software/isce
checking *.e, *.o from /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job
no error found

Jobfiles to run:
/scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0.job
---------------------------------------------------------------------------------------------------------------------------------------------------------
|                      |       | Step      | Total     | Step      |         |                                                                        |  
|                      | Extra | active    | active    | processed | Active  |                                                                        |  
| File Name            | tasks | tasks     | tasks     | jobs      | jobs    | Message                                                                |  
---------------------------------------------------------------------------------------------------------------------------------------------------------
| run_06_ext...n_0.job | 1     | 0/400     | 6/500     | 1/1       | 0/1     | Submitted: 8509649                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------
Jobs submitted: 8509649
unittestGalapagosSenDT128, run_06_extract_stack_valid_region, 1 jobs : 1 COMPLETED , 0 RUNNING , 0 PENDING , 0 WAITING   .
check_job_outputs.py  /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0.job
This is the Open Source version of ISCE.
Some of the workflows depend on a separate licensed package.
To obtain the licensed package, please make a request for ISCE
through the website: https://download.jpl.nasa.gov/ops/request/index.cfm.
Alternatively, if you are a member, or can become a member of WinSAR
you may be able to obtain access to a version of the licensed sofware at
https://winsar.unavco.org/software/isce
checking *.e, *.o from /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0.job
Error: "Error" found in /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0__1.e
Error: "FileNotFoundError" found in /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0__1.e
Error: "Traceback" found in /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region_0__1.e
For known issues see https://github.com/geodesymiami/rsmas_insar/tree/master/docs/known_issues.md
Traceback (most recent call last):
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/minsar/check_job_outputs.py", line 224, in <module>
    main()
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/minsar/check_job_outputs.py", line 194, in main
    raise RuntimeError('Error in run_file: ' + run_file_base)
RuntimeError: Error in run_file: /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_06_extract_stack_valid_region
Error in run_workflow.bash: check_job_outputs.py exited with code (1).
/work2/05861/tg851601/stampede2/code/rsmas_insar/minsar/run_workflow.bash: line 1: 288278 Terminated              tail -f $logfile_name

Can you modify so that errors arealso displayed to screen?

To create an error, just run the example data, then comment copy_to_tmp out in the run_05_*.job and run starting with step 5 (run_workflow.bash $PWD --start 5.

install_to_tmp.bash /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job --prefix tops
#copy_to_tmp.bash /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0.job /scratch/05861/tg851601/unittestGalapagosSenDT128/run_files_tmp/run_05_fullBurst_resample_0 /scratch/05861/tg851601/unittestGalapagosSenDT128
Ovec8hkin commented 2 years ago

I've actually known about this for a while. It's something to do with how the script terminates and how that termination handle buffer flushing. It might not be easily fixable.