CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Logs by exit code #36

Open paorozo opened 7 years ago

paorozo commented 7 years ago

It is super useful to read the logs by exit code, for example this way: https://vocms0113.cern.ch:80/explainerror?errorcode=8001

As operator the 8001 exit code is an interesting one. Most of the failures are due to file reading issues, but sometimes the problem is different and we need to pay special attention.

We can catch an error like this:

Fatal Exception (Exit code: 8001) 
An exception of category 'FileReadError' occurred while
[0] Processing run: 1 lumi: 78114 event: 3632261
[1] Running path 'digitisation_step'
[2] Calling event method for module MixingModule/'mix'
[3] Calling method for unscheduled module MixingModule/'mix'
[4] Rethrowing an exception that happened on a different thread.
[5] Reading branch PCaloHits_g4SimHits_EcalHitsEB_SIM.
[6] LocalCacheFile::cache()
Exception Message:
Unable to cache 134217728 byte file segment at 2013265920: got only 4 bytes back

Likely we need to report the problem on hypernews, but first I need to know what workflow(s) is (are) related to this log. @dabercro, how can we get that?

dabercro commented 7 years ago

I was able to find that log here: https://vocms0113.cern.ch:80/explainerror?errorcode=8001&workflowstep=/pdmvserv_task_TRK-PhaseIFall16GS-00016__v1_T_170310_122236_4192

I did that by first going to https://vocms0113.cern.ch:80/listworkflows?errorcode=8001&sitename=T1_IT_CNAF and then checking the workflows individually. I will try to think of some way to make that easier though.