DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Change format of wrapper job log to split stdout and std error #204

Open StevenCTimm opened 3 days ago

StevenCTimm commented 3 days ago

As currently deployed the "wrapper job log" of the JustIN job intermingles stdout and stderr in the same file, making for a very confusing read when trying to debug things.. often the error message of a failed command can print in the file well above the point where the standard out says it was executed, and it is hard to correlate the two.

Ideally for debugging purposes one would want to make sure the important tasks that are happening such as try 1, try 2, try 3, also get cloned to standard error and then standard error comes back as its own file.

Closely related to that is the fact that some callouts from the wrapper job, for instance the executeMetaCatCommand do not capture the standard error at all. I had already requested In Slack that this be done. The absence of the two requested features caused the debugging of the metacat fail to take about a day longer than it otherwise would have.

Andrew-McNab-UK commented 3 days ago

I have had several goes at this over the lifetime of the project. Ideally, we would want stderr to appear in context with stdout IMO, but that is hard to achieve in practice. However, there are bits of it there already. For example, the reason why executeMetaCatCommand() does not capture stderr is because it runs the metacat command with 2>&1