Open gladky opened 6 years ago
Currently we have single global threshold that we define in expert properties file:
expert.logic.lenghtyfixingsofterror.threshold.period = 30000
One of the solution is to introduce specific TRACKER threshold.
expert.logic.lenghtyfixingsofterror.threshold.period.tracker = 60000
Another raise the threshold for everyone if you think that's appropriate. Please let me know what do you think.
I'm slightly inclined towards having subsystem-specific thresholds (as needed) even though it complicates things a bit.
@erikbutz could you please confirm this request and threshold proposed. I will then include this in next release.
a threshold of 60 or 70 seconds would indeed be preferable. We have slow control readings that access the control token rings during running and if the thread for this blocks the access the start of the soft error recovery will have to wait for it to finish.
in principle the magnitude of the problem is low (we had almost 3000 soft error recoveries since 2016 and only some 20 took more than 30 seconds), but we are taking a look at the recent spill of longer recoveries
From Elog (LOUIS JEAN MOUREAUX)
Remi: