Closed mommsen closed 6 years ago
Thank you for reporting. I will investigate it. http://daq-expert.cms/DAQExpert/?start=2018-05-23T00:24:53.281Z&end=2018-05-23T00:25:53.281Z
Here is what happened:
fixing soft error
state, 1st 14 sec, 2nd 21 sec, between them there was ~35 msec Running state.fixing soft error
state - 35 secLengthy fixing-soft-error
was 30 sec. This means that 2:25:18 o'clock the condition was satisfied.fixing soft error
state at this moment (see the attached screenshot of run info timeline. More specifically ECAL was back in running but L0 still indicated fixing soft error
This situation lasted for 6 seconds and includes transition of TCDS from Paused
to Running
(via TTCHardResetting
and Resuming
). This is a reason why DAQExpert could not fill the problematic SUBSYSTEM information in the message.fixing soft error
even though there was no subsystem in this state?If this is expected we need to update the logic of LM to include this assumption.
Soft error recovery always proceeds as follows: Pause TCDS, send fixSoftError to Subsystem(s), resume TCDS.
Always having a subsystem in fixingSoftError state is a wrong assumption.
On 23 May 2018, at 10:58, Maciej Gladki notifications@github.com wrote:
Here is what happened:
• There were 2 periods where L0 was in fixing soft error state, 1st 14 sec, 2nd 21 sec, between them there was ~35 msec Running state. • Expert saw this as 1 period where L0 was in fixing soft error state - 35 sec • The threshold for firing Lengthy fixing-soft-error was 30 sec. This means that 2:25:18 o'clock the condition was satisfied. • The problem was that no system was in fixing soft error state at this moment (see the attached screenshot of run info timeline. More specifically ECAL was back in running but L0 still indicated fixing soft error This situation lasted for 6 seconds and includes transition of TCDS from Paused to Running (via TTCHardResetting and Resuming). This is a reason why DAQExpert could not fill the problematic SUBSYSTEM information in the message. Why L0 was in fixing soft error even though there was no subsystem in this state?
If this is expected we need to update the logic of LM to include this assumption.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
The LM is now collecting the subsystems that were in FixingSoftError during the period where L0 was in FixingSoftError.
Fixed with 1bcef7a as 2.10.7
The shifter reported tonight in the elog that the DAQExpert did not report the sub-system which is in lengthy fixing-soft-error: