dmwm / CRABServer

16 stars 38 forks source link

declare suspicious file replicas to Rucio in RetryJob #8773

Open belforte opened 2 weeks ago

belforte commented 2 weeks ago

see https://its.cern.ch/jira/browse/CMSTRANSF-1024

should modify https://github.com/dmwm/CRABServer/blob/97f447747265684589ac1f5be773eed80de02239/src/python/TaskWorker/Actions/RetryJob.py#L407 so that suspicious file replicas are reported to Rucio

belforte commented 1 week ago

Need also to prepare a knowledge base with strings to look-for/avoid in log, like

A list is accessing an object (0x154e2060ea40) already deleted (list name = TList)

which is not a corrupted file !

see https://github.com/cms-sw/cmssw/issues/46634

belforte commented 3 days ago

as a start, skip HammerCloud and limit to max 30 reports per task