cmsdaq / DAQExpert

New expert system processing data model produced by DAQAggregator
1 stars 2 forks source link

Check trigger rate before L1 #81

Open gladky opened 7 years ago

gladky commented 7 years ago

@hsakulin suggested to monitor trigger rate before L1. The expected value is below 150kHz but here is the case reported by @hsakulin where it was 40MHz:

http://daq-expert.cms/DAQExpert/?start=2017-06-26T06:26:24.551Z&end=2017-06-26T09:05:25.655Z

This resulted in deadtime. We have the values in the snapshot:


"tcdsGlobalInfo": {
   ...
    "fillNumber": 5873,
    "deadTimes": {
      ...
      "beamactive_total": 99.2945
    },
    ...
    "sup_trg_rate_beamactive_total": 27246700,
    "sup_trg_rate_total": 39399600,
    "trg_rate_beamactive_total": 70753.5,
    "trg_rate_total": 100849,
    ....
  "hltKeyDescription": "/cdaq/physics/Run2017/2e34/v1.1.7/HLT/V2",
  "evm": "RU_ru-c2e14-11-01"

The way LM will work is compare sum of sup_trg_rate_total and trg_rate_total to threshold of 150kHz.

andreh12 commented 7 years ago

We should also check against very low physics trigger rates at input as suggested in #92 .

andreh12 commented 7 years ago

In today's run coordination meeting, run coordination expressed strong interest in having this implemented.

@fwyzard also mentioned that there we could distinguish two cases:

fwyzard commented 7 years ago

Indeed, using the wrong column could give some high rate, maybe also 200 kHz (e.g. using the 1e34 column at ~1.8e34...). On the other hand, noisy towers can give rates in the range of 150-300 kHz.

So I'm not sure what would be the best cut between "wrong prescale column" and "detector effects".

.A

mommsen commented 7 years ago

I think we should really get this check online as soon as possible. Last night we had an instance of a hot ECAL trigger tower which caused a very high trigger rate. CT-PPS FEDs were in busy:

http://daq-expert.cms/daq2view-react/index.html?setup=cdaq&time=2017-08-18-05:13:00

Unfortunately, the DAQ shifter red-recycled CT-PPS twice before the shift crew did the right action and red-recycled ECAL. We lost about 15 minutes of stable-beam time due to this ):

fwyzard commented 7 years ago

By the way, I would propose to put the threshold between the two cases (wrong L1 prescale vs detector misbehaving) at 200 kHz to start with, and adjust it later as needed.

.A

emeschi commented 7 years ago

One way to sort out the two cases would be to instruct the DAQ shifter to either check the individual level 1 rates or ask for them to be checked by the trigger shifter. This could be in the message issued by the expert. I think it is not unconceivable to provide the DAQExpert with some input from the L1 to be able to to distinguish the two. I think, but I am not sure, that one quick way, but not very accurate, is to check the HLT Physics output rate. In the case of a wrong column it will be distinguishably too high whereas in the case of a hot tower it should remain within reasonable values...

fwyzard commented 7 years ago

What you describe is already the job of the trigger shifter...

Yes, the DAQExpert could check the pre/post deadtime of few individual L1 triggers: L1_SingleMu##, L1_SingleEG##, L1_SingleJet##, L1_HTT##, L1_ETM##, L1_ETMHF## to suggest if one specific subdetector is causing troubles.

It could also compare the current luminosity with the prescale column...

.A

mommsen commented 7 years ago

There was another identical case of blaming the CT-PPS FEDs while ECAL was causing a high trigger rate: http://daq-expert.cms/daq2view-react/index.html?setup=cdaq&time=2017-08-20-21:22:57