cmsdaq / DAQExpert

New expert system processing data model produced by DAQAggregator
1 stars 2 forks source link

Special instructions: CTPPS FED in warning #184

Closed gladky closed 6 years ago

gladky commented 6 years ago

From shifter bulletin:

  • if CTPPS_TOT FED 582 or 583 go into 100% warning state and block the run, try to recover it by issuing a Halt to the partition from the drop down menu and then green-recycle it. Please, make a note in e-log explaining how you recovered the problem. (Fabio Ravera 11/07/2017)`

We do have LM that covers this case but gives other recovery suggestion. Expert LM "Fed Stuck". It checks if FED is in Warning or Busy + if partition is in Warning or Busy + if DAQ is in runblocked.

  1. with <> and <> (try up to 2 times)
  2. Problem fixed: Make an e-log entry. Call the DOC of the subsystem {{SUBSYSTEM}} to inform",
  3. Problem not fixed: Call the DOC for the subsystem {{SUBSYSTEM}}
gladky commented 6 years ago

We could introduce specific instructions for CTPPS_TOT parition + only when warning (not busy) + plus only 582 and 583

jjhollar commented 6 years ago

Hi Maciej,

I think for 2018 we can replace this special instruction with the generic "FED Stuck" LM you mentioned:

1 with <RedRecycle::{{SUBSYSTEM}}> and <GreenRecycle::{{SUBSYSTEM}}> (try up to 2 times) 2 Problem fixed: Make an e-log entry. Call the DOC of the subsystem {{SUBSYSTEM}} to inform", 3 Problem not fixed: Call the DOC for the subsystem {{SUBSYSTEM}}

provided that "stuck" means the DAQ is really completely blocked. In the past we've had many cases where shifters get confused by unrelated trigger problems that cause these 2 FED's to have very large (but still <100%) busy/warning fraction. If the pre-deadtime trigger rate is >>100kHz and these 2 are still running with even 99% busy/warning, then recycling them or calling the CTPPS DOC is unlikely to help.

Thanks a lot for cleaning up the instructions, Jonathan

andreh12 commented 6 years ago

@jjhollar we have two modules (HighTcdsInputRate and VeryHighTcdsInputRate) which are configured to fire above 100 and 200 kHz TCDS input (pre-deadtime etc. trigger) rate.

Checking the DAQExpert history, we indeed found e.g. the most recent case of such a condition here: http://daq-expert.cms/DAQExpert/?start=2018-04-26T01:41:14.000Z&end=2018-04-26T01:45:14.000Z (click on Extended) where the TCDS input rate was high.

@gladky checked which error condition was displayed to the shifter and according to the logs it was not the one about the TCDS high input rate. However, he is working on fixing this and we should have an easier to maintain system of 'which problem is causing which other problem' soon.

gladky commented 6 years ago

Leaving the generic instructions as confirmed by @jjhollar. Moved the instructions to "covered by expert" section in the bulletin.