NDCMS / lobster

A userspace workflow management tool for harnessing non-dedicated resources for high-throughput workloads.
MIT License
3 stars 14 forks source link

Need to display error codes from external commands in the logfiles when they fail #631

Open khurtado opened 6 years ago

khurtado commented 6 years ago

When a subprocess command fails, the error code (and output too, preferentially) should be shown for debugging purposes.

E.g: When xrdfs is run via task.py and it fails, it shows:

>>> executing 'env XRD_LOGLEVEL=Debug timeout 300 xrdfs deepthought.crc.nd.edu stat /store/user/awightma/gridpack_test/ctG_slc6_amd64_gcc630_CMSSW_9_3_0_tarball.tar.xz' @ Wed Mar 21 16:42:57 2018
>> using /disk/vc3-root/tmp/worker-205988-23093/t.565/cctools-temp-t.565.jgFLef/tmpg4q0lA to store command output
>>> xrootd access to input file unavailable @ Wed Mar 21 16:42:57 2018
>>>>>> no stage out method succeeded for: ctG_slc6_amd64_gcc630_CMSSW_9_3_0_tarball.tar.xz @ Wed Mar 21 16:42:57 2018
>> trace: Traceback (most recent call last):
>> trace:   File "task.py", line 268, in wrapper
>> trace:     result = fct(data, *args, **kwargs)
>> trace:   File "task.py", line 638, in copy_inputs
>> trace:     raise RuntimeError("no stage-in method succeeded")
>> trace: RuntimeError: no stage-in method succeeded
>>>>> call to 'copy_inputs' failed, exiting with exit code 179 @ Wed Mar 21 16:42:57 2018

Here, 179 is the error code defined when copy_inputs fails, but the xrdfs error code or the debug mode output is never shown, so there is no way to tell why it failed. This was not the case in the past, we used the this line for debugging xrootd cache errors a lot for example.