cmsdaq / DAQExpert

New expert system processing data model produced by DAQAggregator
1 stars 2 forks source link

Tuning LM thresholds: "Continuously Soft Error" #149

Open gladky opened 6 years ago

gladky commented 6 years ago

Here is a case when Level zero was in FixingSoftError 14 times in a period of ~10 minutes:

http://daq-expert.cms/DAQExpert/?start=2017-11-04T23:26:20.079Z&end=2017-11-04T23:46:08.149Z

This is current configuration for the LM ContinouslySoftError:

expert.logic.continoussofterror.threshold.count = 3
# period of 10 mins
expert.logic.continoussofterror.threshold.period = 600000
# period to keep the result on - 15 sec
expert.logic.continoussofterror.threshold.keep = 15000

Expert detected this correctly and yielded multiple conditions according to configuration.

The question is whether we should increase the keep parameter from 15000 ms. In the given case the period between each FixingSoftError was ~45 seconds. This resulted in breaking the analysis results into multiple conditions. After each fixingSoftError we were keeping this condition active for 15 seconds. 30 seconds later it happen again, and again (14 times). If the keep parameter was more than 45 seconds this would result in one condition. Do you think we should increase it? Note that apart from described number of yielded conditions this would also mean that even if the problem was fixed the Expert will claim that there is a problem for the duration defined in keep parameter.