DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Workflows that can't find a FCL file continue submitting junk jobs indefinitely #161

Closed StevenCTimm closed 5 months ago

StevenCTimm commented 6 months ago

both workflow 2001 and 2007 failed immediately because they did not have a fcl file and thus the input file was never allocated, but a log file was generated and dumped into rucio. This keeps on going indefinitely because these failures are not counted as fails.

Need a way to catch these problems and stop them. I couldn't pause them because the admin stuff is broken again.

Andrew-McNab-UK commented 5 months ago

This is addressed in justIN 01.01 with the automated pausing of workflows if too many jobscript_error, or notused, or none_processed job outcomes are detected.