geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
62 stars 23 forks source link

submit_jobs.bash: how to capture full screen log to file #481

Closed falkamelung closed 3 years ago

falkamelung commented 3 years ago

I would like to capture the full output of submit_jobs.bash to a file. It should be doing both, writing to the screen and writing to a file in the working directory. Appending 2>&1 | tee process.log to the command works.

minsarApp.bash /home1/05861/tg851601/code/rsmas_insar/samples/unittestGalapagosSenDT128.template --start jobfiles 2>&1 | tee -a process.log

but this is not ideal, as I frequently forget. Can we add something to submit_jobs.bash that everything is logged to process.log?

I saw something about creating a new file descriptor:

https://www.conjur.org/blog/improving-logs-in-bash-scripts/ https://stackoverflow.com/questions/18460186/writing-outputs-to-log-file-and-console/22373735

I tried with exec but could not figure it out.

falkamelung commented 3 years ago

OLD MATERIAL TO DELETE: THE UPDATED COMMENT ABOVE SHOULD EXPLAIN EVERYTHING

Appending 2>&1 | tee -a process.log to the command works for now, e.g.

minsarApp.bash /home1/05861/tg851601/code/rsmas_insar/samples/unittestGalapagosSenDT128.template --start jobfiles 2>&1 | tee -a process.log

but this is not ideal, as I need a cheat sheet or alias.

It would be great to capture the relevant lines by default in a process.log in the working directory. Creating a new file descriptor and appending to that may work:

https://www.conjur.org/blog/improving-logs-in-bash-scripts/ https://stackoverflow.com/questions/18460186/writing-outputs-to-log-file-and-console/22373735

(I tried but could not figure it out).

Ovec8hkin commented 3 years ago

First, UNDER NO CIRCUMSTANCE EVER PASTE A COMPLETE STDOUT OUTPUT HERE. Absolutely none of that information was useful given the issue.

Second, I think I figured this out. What I do is send all of the output sbatch_conditional to stderr, which ensures that it gets written to the console, but that it doesn't pollute the stdout output that we need to parse. Then I added four lines to write all of stderr/stdout to a file, then open the file as a background process which ensures that its contents are written to screen. This emulates writing to stdout/stderr and a file simultaneously. It can be run as normal: submit_jobs.bash *.template --start 1.

How would you like the log file specified? Command line argument? Default value?

falkamelung commented 3 years ago

Just process.log as default is fine. Preferably, if process.log already exists write into process.1.log, process.2.log, etc That will keep a proper log of all tries.

BTW, you have removed the sbatch error message form submit_jobs.bash. I am not sure why you did this. For debugging we need to know why a submission failed.

Ovec8hkin commented 3 years ago

BTW, you have removed the sbatch error message form submit_jobs.bash. I am not sure why you did this. For debugging we need to know why a submission failed.

Not true, you're using an out of date copy of sbatch_conditional. The new sbatch_conditional logs three line on submission failure, to indicate the failed job, the wait time, and the failure reason. See below.

---------------------------------------------------------------------------------------------------------------------------------------------------------
| File Name            | Extra Tasks | Step Active Tasks | Total Active Tasks | Step Processed Jobs | Active Jobs | Message                             |  
---------------------------------------------------------------------------------------------------------------------------------------------------------
| run_01_unp...e_0.job | 1           | 1/1500            | 1/3000             | 1/11                | 1/5         | Submitted: 7836298                  |
| run_02_unp...c_0.job | 6           | 6/1500            | 7/3000             | 2/11                | 2/5         | Submitted: 7836300                  |
| run_03_ave...e_0.job | 6           | 6/1500            | 13/3000            | 3/11                | 3/5         | Submitted: 7836303                  |
| run_04_ful...r_0.job | 6           | 6/1500            | 19/3000            | 4/11                | 4/5         | Submitted: 7836306                  |
| run_05_ful...e_0.job | 6           | 6/1500            | 25/3000            | 5/11                | 5/5         | Submitted: 7836308                  |
| run_06_ext...n_0.job | 1           | 1/1500            | 26/3000            | 6/11                | 6/5         | Submission failed.                  |
|                      |             |                   |                    |                     |             | Max job count exceeded.             |
|                      |             |                   |                    |                     |             | Wait 5 minutes.                     |
falkamelung commented 3 years ago

Closing becauyse capturing in process0.log works nicely. Thank you