cmsdaq / DAQExpert

New expert system processing data model produced by DAQAggregator
1 stars 2 forks source link

Backpressure cases #141

Open hsakulin opened 6 years ago

hsakulin commented 6 years ago

In order of usefulness (within a group, modules are exclusive or can trigger at the same time):

To be done by the LM: Check if DAQ Backpressure from Filter Farm also fired and mention if it did. (This may appear after the condition has started)

To be done by the LM: Check if DAQ Backpressure from Filter Farm also fired and mention if it did. (This may appear after the condition has started)

To be done by the LM: Check if DAQ Backpressure from Filter Farm also fired and mention if it did. (This may appear after the condition has started)


These 3 cases should be mutually exclusive (see #142)

=> DAQ Backpressure coming from FEROL or FEDBuilder. Call the DAQ On-Call and mention this message.

=> DAQ Backpressure from the Event Builder. Call the DAQ DOC mentioning the backpressure comes from event building.

BETTER (too many requests on EVM here, I took next snapshot): http://daq-expert.cms/daq2view-react/index.html?setup=cdaq&time=2017-10-01-20:12:30 BETTER: http://daq-expert.cms/daq2view-react/index.html?setup=cdaq&time=2017-10-01-20:12:32

The sub-cases below are not needed since each sub-case will already trigger a LM with higher priority. [

Check if the HIGH HLT output rate or CMSSW crashing modules fired

=> DAQ Backpressure from the Filter Farm a) Because of High output rate (#17) => Are we running with the correct pre-scale column ? => talk to the trigger shifter and shift leader. You may need to call the HLT DOC. b) Because of CMSSW processes crashing (#134) => Call the HLT DOC, mentioning the messages under you see under HLT Alerts in F3 Mon. Call the DAQ DOC. He might need to clean up the Filter Farm. c) HLT CPU usage high (#44) => Are we running with the correct pre-scale column ? => talk to the trigger shifter and shift leader. You may need to call the HLT DOC.. d) (none of a,b,c) => Unidentified problem => Call the DAQ DOC.

]

gladky commented 6 years ago

LM BackpressureFromFerol with described conditions also fires for the test case of BackpressureFromEventBuilding with following message:

DAQ backpressure coming from FEROL or FEDBuilder. FED Builder with backpressure 2.6 to FED 359. Corresponding RU ru-c2e13-15-01.cms has more than 0 requests and less than 256 fragments.

@mommsen Is adding an extra check to require EVM requests to be > 100 for BackpressureFromFerol correct?

mommsen commented 6 years ago

Hi @gladky,

I'm not sure I understand your question. The fact that the EVM has few events is indeed a critical criteria for 'Backpressure from Event Building' and 'Backpressure from HLT'.

Remi

gladky commented 6 years ago

@mommsen, sorry for the brevity. Here is what I mean:

3 checks for BackpressureFromFerol have been specified in the description of this issue:

5 checks for BakcpressureFromEventBuilding have been specified in the description of this issue:

My point here is that BackpressureFromFerol has no checks related to EVM requests specified at all. Should we add a following check:

I propose this check based on the first difference that I see in the snapshot in DAQView between this two test cases. (first test case for BackpressureFromFerol, second test case for BakcpressureFromEventBuilding)

mommsen commented 6 years ago

Hi @gladky,

no, this condition should not be applied for BackpressureFromFerol. It might be that the slow RU monopolizes most requests in the system, which would cause the number of requests on the EVM to be low.

Remi

gladky commented 6 years ago

The deadtime analysis is almost ready. There are 3 pull requests ready to be reviewed #154, #156, #157 , @andreh12 is about to submit one more and we will have everything from this issue covered. Here is a graph to review how the deadtime-analysis-related LMs will work together (which are independent, which reuse or depend on others). I've already reviewed it wit @andreh12, @hsakulin please let me know if you have suggestions. Note that the requirement "Temporary Special for uTCA: FED with >2% DAQ Backpressure (with 3 sub-cases)" will be satisfied by firing the BackpressureFromFerol/evm/hltLMs also when there is PartitionDeadtime to cover the uTCA FEDs.

deadtimeanalysis 1