Closed gladky closed 6 years ago
We have 2 LMs related to this instructions:
ECAL specific instructions for corrupted data
- Try to stop/start the run (Red recycle DAQ only)
- If this doesn't help: Stop the run. Red & green recycle both the DAQ and the subsystem {{PROBLEM-SUBSYSTEM}}. Start new Run. (Try up to 2 times)
- Problem fixed: Make an e-log entry. Call the DOC of {{PROBLEM-SUBSYSTEM}} (subsystem that sent corrupted data) to inform about the problem
- Problem not fixed: Call the DOC of {{PROBLEM-SUBSYSTEM}} (subsystem that sent corrupted data)
ECAL specific instructions fo out of sequence data recieved
- Try to stop/start the run",
- If this doesn't help: Stop the run. Red & green recycle both the DAQ and the subsystem {{PROBLEM-SUBSYSTEM}}. Start new Run. (Try up to 2 times)",
- Problem fixed: Make an e-log entry. Call the DOC of {{PROBLEM-SUBSYSTEM}} (subsystem that sent out-of-sync data) to inform about the problem",
- Problem not fixed: Call the DOC of {{PROBLEM-SUBSYSTEM}} (subsystem that sent out-of-sync data data)
- Try to stop/start the run (Red recycle DAQ only)
- If this doesn't help: Stop the run. Red & green recycle both the DAQ and the subsystem {{PROBLEM-SUBSYSTEM}}. Start new Run.
- Problem fixed: Make an e-log entry. If this happen during physics data taking call the DOC of {{PROBLEM-SUBSYSTEM}} (subsystem that sent
corrupted data
/out of sequence data
) to inform about the problem- Problem not fixed: Call the DOC of {{PROBLEM-SUBSYSTEM}} (subsystem that sebt
corrupted data
/out of sequence data
)
note that
just to add my two cents on the phone numbers: in the future we could add an (external) configuration file with the map of subsystem to DOC phone numbers to show the phone numbers directly in the message (if people agree, we can open an issue for that but with low priority).
I found another special instructions that are related to ECAL (labeled as 2 in first comment). It seems to conflict to the 1st one.
Unless "try a run recovery w/o recycling first" == "recycle DAQ only"
Notes from @hsakulin
We should avoid executing unnecessary recovery step of red-recycling DAQ where possible.
Contacted ECAL, for reference:
Hello Giacomo
Ecal is currently the only subsystem that requires red-recycle of DAQ subsystem in the special recovery instructions in case of syncloss problems.
Note that DAQ subsystem generally does not require a red-recycle from RunBlocked which is the case in syncloss problems. However, it does require red-recycle from Error state that is the case for corrupted data received problems.
In the shifter bulletin board I found:
If ECAL sends corrupted data to the DAQ (DAQExpert will warn about Corrupted data received) or causes a syncloss, try to recover by stopping the run; red-recycling DAQ only; starting a new run.
Is there a reason why you recommend to do the Red recycle of DAQ subsytem for syncloss problems?
Additionally could you please review the special instructions from the bulletin board for ECAL? We've extracted them to github issue:
Reply
I think it is related to an issue we have seen while we are testing the new SLinks.
As reported also in the email thread, we have new instructions in case of syncloss:
In this case we will have following instructions.
ECAL corrupted data received
- Try to stop/start the run (Red recycle DAQ only)
- If this doesn't help: Stop the run. Red & green recycle both the DAQ and the subsystem ECAL. Start new Run.
- Problem fixed: Make an e-log entry. If this happen during physics data taking call the DOC of ECAL (subsystem that sent corrupted data) to inform about the problem
- Problem not fixed: Call the DOC of ECAL (subsystem that sent corrupted data)
ECAL syncloss
- Stop/start the run
- If this doesn't help: Stop the run. Red recycle the subsystem ECAL. Start new Run.
- In the meanwhile call ECAL DOC
- Problem not fixed: Call the DOC of ECAL
Note that:
@giacomoCucciati after you confirm the final form I will introduce this changes to expert system and move these instruction to new section in bulletin board "Covered by DAQExpert"
The instructions are ok. Yes the point 3) can be improved and I would also add this information:
Final, confirmed version of ECAL special instructions:
ECAL corrupted data received
- Try to stop/start the run (Red recycle DAQ only)
- If this doesn't help: Stop the run. Red & green recycle both the DAQ and the subsystem ECAL. Start new Run.
- Problem fixed: Make an e-log entry. If this happen during physics data taking call the DOC of ECAL (subsystem that sent corrupted data) to inform about the problem
- Problem not fixed: Call the DOC of ECAL (subsystem that sent corrupted data)
ECAL syncloss
- Stop/start the run
- If this doesn't help: Stop the run. Red recycle the subsystem ECAL. Start new Run.
- Call ECAL DOC during the Red Recycle (only if beam is not in RAMP mode)
- Problem not fixed: Call the DOC of ECAL
Included in 2.13.0, moved to section "covered by expert"
From shifter bulletin:
1
2