Describe the bug
Filing this issue because it's the second time I spot it (in different releases), apparently not related to the python3 agent.
Problem is with the cmsRun subprocess, which returns with an exit code different than 0, but when we parse the job report for the exit code and error message, the code it finds is actually 0 (successful cmsRun process!). Here is a snippet of the job logs [1].
How to reproduce it
I do not think it's reproducible, but this issue popped up with Py3 WMAgent 1.5.2.pre2 and the test workflow template DMWM/TC_PY3_ProdTTbar.json
Expected behavior
If the subprocess call returned with an exit code != 0, we should make sure this gets properly reported. Then we can also parse the job report and report the exit code from there as well.
We might decide to run some debugging code to catch this inconsistency as well, such that we can find what the actual root cause is.
Impact of the bug WMAgent
Describe the bug Filing this issue because it's the second time I spot it (in different releases), apparently not related to the python3 agent.
Problem is with the cmsRun subprocess, which returns with an exit code different than 0, but when we parse the job report for the exit code and error message, the code it finds is actually 0 (successful cmsRun process!). Here is a snippet of the job logs [1].
How to reproduce it I do not think it's reproducible, but this issue popped up with Py3 WMAgent 1.5.2.pre2 and the test workflow template DMWM/TC_PY3_ProdTTbar.json
Expected behavior If the subprocess call returned with an exit code != 0, we should make sure this gets properly reported. Then we can also parse the job report and report the exit code from there as well.
We might decide to run some debugging code to catch this inconsistency as well, such that we can find what the actual root cause is.
This is likely the place to have these changes done: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/Steps/Executors/CMSSW.py#L280
Additional context and error message [1]