ISISComputingGroup / IBEX

Top level repository for IBEX stories
5 stars 2 forks source link

EMU: MercuryiTC communication failures #6286

Closed JamesKingWork closed 3 years ago

JamesKingWork commented 3 years ago

As an EMU scientist, I would like to be able to talk reliably to a MercuryiTC. The scientists initially raised concerns about intermittent invalid alarms and it appears this is due to communication failures to the device.

Acceptance criteria:

MicrosoftTeams-image (1)

Notes:

FreddieAkeroyd commented 3 years ago

The device supports RS232 flow control - could we use that? Or is our baud rate too high for specifics of the connection?

JamesKingWork commented 3 years ago

For help with prioritising from the IS: "We have to say that this needs to be fixed before the Cycle because we recently found that it’s much more than just annoying i.e. it caused a failure of switching on the booster heater. The reason for this is that the booster heater script refers to the Temp_Sample block for determining whether it should switch the heater on, and if the target temperature is higher than Temp_Sample, it will start using the booster heater by csetting the same block. If the block is temporarily in ‘invalid’, then it certainly fails to read and set the temperature (and just carries on). The booster heater is a crucial kit for carrying out the user programme in the next Cycle on EMU and MUSR. We have already seen this happened twice during this short test period (about a week). So, this problem has a potential to cause lost times. I can describe more details if this needs a quicker response"

kjwoodsISIS commented 3 years ago

OK. Well, perhaps we should bring this into the sprint (possibly at the expense of something else). We can discuss this at tomorrow's daily stand-up.

Some questions:

  1. Can we get access to a Mercury ITC for as long as it takes to resolve the problem?
    • one of the problems we have is that we get limited access to these devices when first creating the IOCs. Often this time is not long enough for some types of communications problem to emerge.
  2. Is it clear how to reproduce the problem, or is it pretty much guaranteed that the problem will happen in the normal course of operations?
  3. Do we need a scientist (or technician) to be there while we investigate the problem?
JamesKingWork commented 3 years ago
  1. I have emailed the IS to ask for when we can get access to the device.
  2. The problem seems to be ever-present on EMU from my understanding
  3. Likely we will need a scientist/technician there to help investigate the problem (could be something set on the device like the baud rate)
JamesKingWork commented 3 years ago

The decision has been made to do this next sprint

rerpha commented 3 years ago

EMU and likely MuSR may not be able to run this cycle until we fix this so muons are quite concerned.

FreddieAkeroyd commented 3 years ago

Do we know yet if a Mercury iTC on MuSR shows the same issues?

JamesKingWork commented 3 years ago

In the meeting today they said they had not noticed anything, but they are concerned that if it hits EMU it might trouble MuSR too. I am sure we will hear if they do find the same issue

FreddieAkeroyd commented 3 years ago

What is the situation regarding setup and testing on EMU, can they put something online for us?

JamesKingWork commented 3 years ago

IS: "The cryostat was on the instrument and showing the “invalid” error continuously until a few days ago. However, the cryostat is now gone and it’s a bit tricky how we can reproduce the state. The Mercury itc is still connected to the instrument, but the errors are much less frequent. My feeling is that these errors appear when there is a change in the blocks. With all sockets disconnected, the values do not change much. We are looking for a way that we can connect the controller to something – can be an empty cryostat or just some probes, so that it reads some meaningful number. This will happen probably sometime in the next week. Will keep you posted."

rerpha commented 3 years ago

We have just tested the RIKEN mercury plugged into MuSR as that also had problems. In the mercury logs there are various issues:

We managed to get this to happen with these steps in the genie python console:

r.settemp(1)
r.settemp(2)
r.settemp(1)
Block cycle is in alarm: INVALID

The logs:


[2021-03-29 15:16:12] 2021/03/29 15:16:12.558569 L0 IN:MUSR:MERCURY_01:2:P: Input "OOP:TSET:0.0000K"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.558584 L0 IN:MUSR:MERCURY_01:2:P: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.558592 L0 IN:MUSR:MERCURY_01:2:P: got "OOP:TSET:0..." where "STAT:DEV:DB7.T1:TEMP:LOOP:P:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.571354 L0 IN:MUSR:MERCURY_01:2:I: Input "...TAT:DEV:DB7.T1:TEMP:SIG:RES:10000000.000..."

[2021-03-29 15:16:12] 2021/03/29 15:16:12.571368 L0 IN:MUSR:MERCURY_01:2:I: mismatch after 21 bytes "...7.T1:TEMP:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.571376 L0 IN:MUSR:MERCURY_01:2:I: got "SIG:RES:10..." where "LOOP:I:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.589997 L0 IN:MUSR:MERCURY_01:2:D: Input "...EV:DB7.T1:TEMP:LOOP:P:1.0"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.590010 L0 IN:MUSR:MERCURY_01:2:D: mismatch after 26 bytes "...TEMP:LOOP:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.590018 L0 IN:MUSR:MERCURY_01:2:D: got "P:1.0" where "D:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.592803 L0 IN:MUSR:MERCURY_01:2:HEATER: Input ":LOOP:I:1.0"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.592815 L0 IN:MUSR:MERCURY_01:2:HEATER: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.592823 L0 IN:MUSR:MERCURY_01:2:HEATER: got ":LOOP:I:1...." where "STAT:DEV:DB7.T1:TEMP:LOOP:HSET:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.609043 L0 IN:MUSR:MERCURY_01:2:HEATER:VOLT_LIMIT: Input "STAT:DEV:DB7.T1:TEMP:LOOP:D:1.0"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.609057 L0 IN:MUSR:MERCURY_01:2:HEATER:VOLT_LIMIT: mismatch after 16 bytes "...EV:DB7.T1:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.609066 L0 IN:MUSR:MERCURY_01:2:HEATER:VOLT_LIMIT: got "TEMP:LOOP:..." where "HTR:VLIM:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.630258 L0 IN:MUSR:MERCURY_01:2:HEATER:POWER: Input "STAT:DEV:DB7.T1:TEMP:LOOP:HSET:0"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.630275 L0 IN:MUSR:MERCURY_01:2:HEATER:POWER: mismatch after 16 bytes "...EV:DB7.T1:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.630284 L0 IN:MUSR:MERCURY_01:2:HEATER:POWER: got "TEMP:LOOP:..." where "HTR:SIG:POWR:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.634496 L0 IN:MUSR:MERCURY_01:2:HEATER:VOLT: Input "TAT:DEV:None:N/A"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.634509 L0 IN:MUSR:MERCURY_01:2:HEATER:VOLT: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.634517 L0 IN:MUSR:MERCURY_01:2:HEATER:VOLT: got "TAT:DEV:No..." where "STAT:DEV:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.725137 L0 IN:MUSR:MERCURY_01:3:TEMP:SP:RBV: Input "STAT:DEV:None:N/A"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.725155 L0 IN:MUSR:MERCURY_01:3:TEMP:SP:RBV: mismatch after 9 bytes "STAT:DEV:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.725164 L0 IN:MUSR:MERCURY_01:3:TEMP:SP:RBV: got "None:N/A" where "DB6.T1:TEMP:LOOP:TSET:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.727118 L0 IN:MUSR:MERCURY_01:3:RESISTANCE: Input ".000K"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.727130 L0 IN:MUSR:MERCURY_01:3:RESISTANCE: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.727138 L0 IN:MUSR:MERCURY_01:3:RESISTANCE: got ".000K" where "STAT:DEV:DB6.T1:TEMP:SIG:RES:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.792958 L0 IN:MUSR:MERCURY_01:3:P: Input "...TAT:DEV:DB6.T1:TEMP:SIG:RES:10000000.000..."

[2021-03-29 15:16:12] 2021/03/29 15:16:12.792976 L0 IN:MUSR:MERCURY_01:3:P: mismatch after 21 bytes "...6.T1:TEMP:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.792985 L0 IN:MUSR:MERCURY_01:3:P: got "SIG:RES:10..." where "LOOP:P:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.798810 L0 IN:MUSR:MERCURY_01:3:I: Input ":DB6.T1:TEMP:LOOP:P:2.000000..."

[2021-03-29 15:16:12] 2021/03/29 15:16:12.798825 L0 IN:MUSR:MERCURY_01:3:I: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.798833 L0 IN:MUSR:MERCURY_01:3:I: got ":DB6.T1:TE..." where "STAT:DEV:DB6.T1:TEMP:LOOP:I:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.840682 L0 IN:MUSR:MERCURY_01:3:D: Input "...EV:DB6.T1:TEMP:LOOP:I:2.000000"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.840697 L0 IN:MUSR:MERCURY_01:3:D: mismatch after 26 bytes "...TEMP:LOOP:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.840706 L0 IN:MUSR:MERCURY_01:3:D: got "I:2.000000" where "D:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.844930 L0 IN:MUSR:MERCURY_01:3:HEATER: Input ":TEMP:LOOP:D:0.000000..."

[2021-03-29 15:16:12] 2021/03/29 15:16:12.844942 L0 IN:MUSR:MERCURY_01:3:HEATER: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.844949 L0 IN:MUSR:MERCURY_01:3:HEATER: got ":TEMP:LOOP..." where "STAT:DEV:DB6.T1:TEMP:LOOP:HSET:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.880449 L0 IN:MUSR:MERCURY_01:3:HEATER:VOLT_LIMIT: Input "STAT:DEV:DB6.T1:TEMP:LOOP:HSET:0"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.880463 L0 IN:MUSR:MERCURY_01:3:HEATER:VOLT_LIMIT: mismatch after 16 bytes "...EV:DB6.T1:"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.880472 L0 IN:MUSR:MERCURY_01:3:HEATER:VOLT_LIMIT: got "TEMP:LOOP:..." where "HTR:VLIM:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.883136 L0 IN:MUSR:MERCURY_01:3:HEATER:POWER: Input "TR:VLIM:20"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.883150 L0 IN:MUSR:MERCURY_01:3:HEATER:POWER: mismatch after 0 bytes ""

[2021-03-29 15:16:12] 2021/03/29 15:16:12.883158 L0 IN:MUSR:MERCURY_01:3:HEATER:POWER: got "TR:VLIM:20" where "STAT:DEV:" was expected

[2021-03-29 15:16:12] 2021/03/29 15:16:12.920375 L0 IN:MUSR:MERCURY_01:3:HEATER:CURR: Input "SH_H:HTR:SIG:VOLT:0.0236V..."

[2021-03-29 15:16:12] 2021/03/29 15:16:12.920389 L0 IN:MUSR:MERCURY_01:3:HEATER:CURR: mismatch after 1 byte "S"

[2021-03-29 15:16:12] 2021/03/29 15:16:12.920397 L0 IN:MUSR:MERCURY_01:3:HEATER:CURR: got "H_H:HTR:SI..." where "TAT:DEV:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.114763 L0 IN:MUSR:MERCURY_01:1:I:SP: Input "DB2.L1:LVL:SIG:NIT:LEV:472.0693%..."

[2021-03-29 15:16:15] 2021/03/29 15:16:15.114780 L0 IN:MUSR:MERCURY_01:1:I:SP: mismatch after 0 bytes ""

[2021-03-29 15:16:15] 2021/03/29 15:16:15.114789 L0 IN:MUSR:MERCURY_01:1:I:SP: got "DB2.L1:LVL..." where "STAT:SET:DEV:MB1.T1:TEMP:LOOP:I:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.387482 L0 IN:MUSR:MERCURY_01:1:TEMP: Input "STAT:SET:DEV:MB1.T1:TEMP:LOOP:I:0.300000..."

[2021-03-29 15:16:15] 2021/03/29 15:16:15.387499 L0 IN:MUSR:MERCURY_01:1:TEMP: mismatch after 5 bytes "STAT:"

[2021-03-29 15:16:15] 2021/03/29 15:16:15.387513 L0 IN:MUSR:MERCURY_01:1:TEMP: got "SET:DEV:MB..." where "DEV:MB1.T1:TEMP:SIG:TEMP:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.394159 L0 IN:MUSR:MERCURY_01:2:TEMP: Input "DEV:MB1.T1:TEMP:SIG:TEMP:1.1993K..."

[2021-03-29 15:16:15] 2021/03/29 15:16:15.394173 L0 IN:MUSR:MERCURY_01:2:TEMP: mismatch after 0 bytes ""

[2021-03-29 15:16:15] 2021/03/29 15:16:15.394181 L0 IN:MUSR:MERCURY_01:2:TEMP: got "DEV:MB1.T1..." where "STAT:DEV:DB7.T1:TEMP:SIG:TEMP:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.698798 L0 IN:MUSR:MERCURY_01:3:TEMP: Input "STAT:DEV:DB7.T1:TEMP:SIG:TEMP:0.0000K..."

[2021-03-29 15:16:15] 2021/03/29 15:16:15.698816 L0 IN:MUSR:MERCURY_01:3:TEMP: mismatch after 11 bytes "...TAT:DEV:DB"

[2021-03-29 15:16:15] 2021/03/29 15:16:15.698825 L0 IN:MUSR:MERCURY_01:3:TEMP: got "7.T1:TEMP:..." where "6.T1:TEMP:SIG:TEMP:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.703751 L0 IN:MUSR:MERCURY_01:4:TEMP: Input "1:TEMP:SIG:TEMP:1.2046K..."

[2021-03-29 15:16:15] 2021/03/29 15:16:15.703765 L0 IN:MUSR:MERCURY_01:4:TEMP: mismatch after 0 bytes ""

[2021-03-29 15:16:15] 2021/03/29 15:16:15.703773 L0 IN:MUSR:MERCURY_01:4:TEMP: got "1:TEMP:SIG..." where "STAT:DEV:DB8.T1:TEMP:SIG:TEMP:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.949553 L0 IN:MUSR:MERCURY_01:PRESSURE:1:PRESSURE: Input "STAT:DEV:DB8.T1:TEMP:SIG:TEMP:10000000.0..."

[2021-03-29 15:16:15] 2021/03/29 15:16:15.949569 L0 IN:MUSR:MERCURY_01:PRESSURE:1:PRESSURE: mismatch after 11 bytes "...TAT:DEV:DB"

[2021-03-29 15:16:15] 2021/03/29 15:16:15.949577 L0 IN:MUSR:MERCURY_01:PRESSURE:1:PRESSURE: got "8.T1:TEMP:..." where "2.P1:PRES:SIG:PRES:" was expected

[2021-03-29 15:16:15] 2021/03/29 15:16:15.952550 L0 IN:MUSR:MERCURY_01:1:D:SP: Input "V:DB2.P1:N/A"

[2021-03-29 15:16:15] 2021/03/29 15:16:15.952563 L0 IN:MUSR:MERCURY_01:1:D:SP: mismatch after 0 bytes ""

[2021-03-29 15:16:15] 2021/03/29 15:16:15.952571 L0 IN:MUSR:MERCURY_01:1:D:SP: got "V:DB2.P1:N..." where "STAT:SET:DEV:MB1.T1:TEMP:LOOP:D:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.008831 L0 IN:MUSR:MERCURY_01:1:HEATER:VOLT_LIMIT:SP: Input "STAT:SET:DEV:MB1.T1:TEMP:LOOP:D:0.000000..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.008848 L0 IN:MUSR:MERCURY_01:1:HEATER:VOLT_LIMIT:SP: mismatch after 20 bytes "...EV:MB1.T1:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.008857 L0 IN:MUSR:MERCURY_01:1:HEATER:VOLT_LIMIT:SP: got "TEMP:LOOP:..." where "HTR:VLIM:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.017092 L0 IN:MUSR:MERCURY_01:1:TEMP:SP: Input "SET:DEV:MB0.VTI_H:HTR:VLIM:40.000000:VAL..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.017106 L0 IN:MUSR:MERCURY_01:1:TEMP:SP: mismatch after 1 byte "S"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.017114 L0 IN:MUSR:MERCURY_01:1:TEMP:SP: got "ET:DEV:MB0..." where "TAT:SET:DEV:MB1.T1:TEMP:LOOP:TSET:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.059544 L0 IN:MUSR:MERCURY_01:1:NAME: Input "STAT:SET:DEV:MB1.T1:TEMP:LOOP:TSET:1.000..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.059562 L0 IN:MUSR:MERCURY_01:1:NAME: mismatch after 5 bytes "STAT:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.059571 L0 IN:MUSR:MERCURY_01:1:NAME: got "SET:DEV:MB..." where "DEV:MB1.T1:TEMP:NICK:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.066404 L0 IN:MUSR:MERCURY_01:1:HTRCHAN: Input "TAT:DEV:MB1.T1:TEMP:NICK:MB1.VTI_T..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.066419 L0 IN:MUSR:MERCURY_01:1:HTRCHAN: mismatch after 0 bytes ""

[2021-03-29 15:16:16] 2021/03/29 15:16:16.066427 L0 IN:MUSR:MERCURY_01:1:HTRCHAN: got "TAT:DEV:MB..." where "STAT:DEV:MB1.T1:TEMP:LOOP:HTR:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.102028 L0 IN:MUSR:MERCURY_01:1:AUXCHAN: Input "...EV:MB1.T1:TEMP:LOOP:HTR:MB0.VTI_H"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.102044 L0 IN:MUSR:MERCURY_01:1:AUXCHAN: mismatch after 26 bytes "...TEMP:LOOP:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.102052 L0 IN:MUSR:MERCURY_01:1:AUXCHAN: got "HTR:MB0.VT..." where "AUX:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.105785 L0 IN:MUSR:MERCURY_01:1:CALFILE: Input ".T1:TEMP:LOOP:AUX:DB4.VTI_NV..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.105797 L0 IN:MUSR:MERCURY_01:1:CALFILE: mismatch after 0 bytes ""

[2021-03-29 15:16:16] 2021/03/29 15:16:16.105805 L0 IN:MUSR:MERCURY_01:1:CALFILE: got ".T1:TEMP:L..." where "STAT:DEV:MB1.T1:TEMP:CAL:FILE:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.126872 L0 IN:MUSR:MERCURY_01:2:NAME: Input "STAT:DEV:MB1.T1:TEMP:CAL:FILE:X127679_HX..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.126887 L0 IN:MUSR:MERCURY_01:2:NAME: mismatch after 9 bytes "STAT:DEV:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.126896 L0 IN:MUSR:MERCURY_01:2:NAME: got "MB1.T1:TEM..." where "DB7.T1:TEMP:NICK:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.132968 L0 IN:MUSR:MERCURY_01:2:HTRCHAN: Input ":DEV:DB7.T1:TEMP:NICK:DB7._T..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.132983 L0 IN:MUSR:MERCURY_01:2:HTRCHAN: mismatch after 0 bytes ""

[2021-03-29 15:16:16] 2021/03/29 15:16:16.132991 L0 IN:MUSR:MERCURY_01:2:HTRCHAN: got ":DEV:DB7.T..." where "STAT:DEV:DB7.T1:TEMP:LOOP:HTR:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.158303 L0 IN:MUSR:MERCURY_01:2:AUXCHAN: Input "...EV:DB7.T1:TEMP:LOOP:HTR:None"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.158318 L0 IN:MUSR:MERCURY_01:2:AUXCHAN: mismatch after 26 bytes "...TEMP:LOOP:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.158326 L0 IN:MUSR:MERCURY_01:2:AUXCHAN: got "HTR:None" where "AUX:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.168323 L0 IN:MUSR:MERCURY_01:2:CALFILE: Input "...TAT:DEV:DB7.T1:TEMP:LOOP:AUX:None"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.168338 L0 IN:MUSR:MERCURY_01:2:CALFILE: mismatch after 21 bytes "...7.T1:TEMP:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.168347 L0 IN:MUSR:MERCURY_01:2:CALFILE: got "LOOP:AUX:N..." where "CAL:FILE:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.197463 L0 IN:MUSR:MERCURY_01:3:NAME: Input "STAT:DEV:DB7.T1:TEMP:CAL:FILE:X135268_SR..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.197478 L0 IN:MUSR:MERCURY_01:3:NAME: mismatch after 11 bytes "...TAT:DEV:DB"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.197486 L0 IN:MUSR:MERCURY_01:3:NAME: got "7.T1:TEMP:..." where "6.T1:TEMP:NICK:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.203666 L0 IN:MUSR:MERCURY_01:3:HTRCHAN: Input ":DEV:DB6.T1:TEMP:NICK:DB6.SH_T..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.203681 L0 IN:MUSR:MERCURY_01:3:HTRCHAN: mismatch after 0 bytes ""

[2021-03-29 15:16:16] 2021/03/29 15:16:16.203689 L0 IN:MUSR:MERCURY_01:3:HTRCHAN: got ":DEV:DB6.T..." where "STAT:DEV:DB6.T1:TEMP:LOOP:HTR:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.229250 L0 IN:MUSR:MERCURY_01:3:AUXCHAN: Input "...EV:DB6.T1:TEMP:LOOP:HTR:DB1.SH_H"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.229265 L0 IN:MUSR:MERCURY_01:3:AUXCHAN: mismatch after 26 bytes "...TEMP:LOOP:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.229274 L0 IN:MUSR:MERCURY_01:3:AUXCHAN: got "HTR:DB1.SH..." where "AUX:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.257757 L0 IN:MUSR:MERCURY_01:3:CALFILE: Input "...TAT:DEV:DB6.T1:TEMP:LOOP:AUX:None"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.257774 L0 IN:MUSR:MERCURY_01:3:CALFILE: mismatch after 21 bytes "...6.T1:TEMP:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.257783 L0 IN:MUSR:MERCURY_01:3:CALFILE: got "LOOP:AUX:N..." where "CAL:FILE:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.263952 L0 IN:MUSR:MERCURY_01:4:NAME: Input "EMP:CAL:FILE:X141722_SH_K3.dat..."

[2021-03-29 15:16:16] 2021/03/29 15:16:16.263967 L0 IN:MUSR:MERCURY_01:4:NAME: mismatch after 0 bytes ""

[2021-03-29 15:16:16] 2021/03/29 15:16:16.263987 L0 IN:MUSR:MERCURY_01:4:NAME: got "EMP:CAL:FI..." where "STAT:DEV:DB8.T1:TEMP:NICK:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.299550 L0 IN:MUSR:MERCURY_01:4:HTRCHAN: Input "...TAT:DEV:DB8.T1:TEMP:NICK:DB8._T"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.299566 L0 IN:MUSR:MERCURY_01:4:HTRCHAN: mismatch after 21 bytes "...8.T1:TEMP:"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.299575 L0 IN:MUSR:MERCURY_01:4:HTRCHAN: got "NICK:DB8._..." where "LOOP:HTR:" was expected

[2021-03-29 15:16:16] 2021/03/29 15:16:16.303342 L0 IN:MUSR:MERCURY_01:4:AUXCHAN: Input "P:LOOP:HTR:None"

[2021-03-29 15:16:16] 2021/03/29 15:16:16.303355 L0 IN:MUSR:MERCURY_01:4:AUXCHAN: mismatch after 0 bytes ""

[2021-03-29 15:16:16] 2021/03/29 15:16:16.303362 L0 IN:MUSR:MERCURY_01:4:AUXCHAN: got "P:LOOP:HTR..." where "STAT:DEV:DB8.T1:TEMP:LOOP:AUX:" was expected

[2021-03-29 15:16:17] 2021/03/29 15:16:17.023528 L0 IN:MUSR:MERCURY_01:4:CALFILE: Input "...TAT:DEV:DB8.T1:TEMP:LOOP:AUX:None"

[2021-03-29 15:16:17] 2021/03/29 15:16:17.023547 L0 IN:MUSR:MERCURY_01:4:CALFILE: mismatch after 21 bytes "...8.T1:TEMP:"

[2021-03-29 15:16:17] 2021/03/29 15:16:17.023557 L0 IN:MUSR:MERCURY_01:4:CALFILE: got "LOOP:AUX:N..." where "CAL:FILE:" was expected

[2021-03-29 15:16:17] 2021/03/29 15:16:17.027447 L0 IN:MUSR:MERCURY_01:LEVEL:1:NAME: Input "EMP:CAL:FILE:None"

[2021-03-29 15:16:17] 2021/03/29 15:16:17.027460 L0 IN:MUSR:MERCURY_01:LEVEL:1:NAME: mismatch after 0 bytes ""

[2021-03-29 15:16:17] 2021/03/29 15:16:17.027468 L0 IN:MUSR:MERCURY_01:LEVEL:1:NAME: got "EMP:CAL:FI..." where "STAT:DEV:DB2.L1:LVL:NICK:" was expected

[2021-03-29 15:16:17] 2021/03/29 15:16:17.131032 L0 IN:MUSR:MERCURY_01:PRESSURE:1:NAME: Input "STAT:DEV:DB2.L1:LVL:NICK:DB2.LM"

[2021-03-29 15:16:17] 2021/03/29 15:16:17.131048 L0 IN:MUSR:MERCURY_01:PRESSURE:1:NAME: mismatch after 13 bytes "...T:DEV:DB2."

[2021-03-29 15:16:17] 2021/03/29 15:16:17.131057 L0 IN:MUSR:MERCURY_01:PRESSURE:1:NAME: got "L1:LVL:NIC..." where "P1:PRES:NICK:" was expected

[2021-03-29 15:16:17] 2021/03/29 15:16:17.556995 L0 IN:MUSR:MERCURY_01:1:PID:AUTO: Match passed again
rerpha commented 3 years ago

We were also seeing invalids on a block referencing the pressure card of a mercury occasionally (on the riken one)

JamesKingWork commented 3 years ago

EMU are currently attempting to recreate invalid alarms with a flow cryostat connected to a different mercury. The RIKEN mercury in the log above is yet another mercury.

FreddieAkeroyd commented 3 years ago

There are two sources of invalid appearing on blocks, ones does not log to the data file and the other does.

It looks like the first is due to a lock timeout, which is only recorded in steam debug mode

[2021-04-01 21:40:01] 2021/04/01 21:39:59.833530 timerQueue c:\instrument\apps\epics\support\streamdevice\master\src\streamcore.cc:840: IN:EMU:MERCURY_01:1:AUXCHAN: Cannot lock device within 5000 ms, device seems to be busy

The lock timer starts as soon as the record wishes to be processed and access the device, if it times out the operation is abandoned and the PV is put into timeout alarm. This is the cause of a majority of the invalid blocks and could lead to both a failure to read and a failure to write. The problem is caused by a lot of records each being on a periodic SCAN rather than using forward links - with a lot of cards in the mercury it may take just a small delay on a few records and then a large number of records all time out. As RS232 only supports single access, it would be better is the records were linked in a chain and only request access when there is a realistic chance of it being granted quickly. I have temporarily increased the stream device lock timeout from 5 to 30 seconds and that looks to have fixed this particular issue for the moment.

The second issue, that does leave messages in the log file, may just be a response taking a bit too long. I've increased reply timeout from 1 to 2 seconds, but too early to say if that has resolved the issue. However these messages occurred far less frequently that the ones caused by the lock timeout. It seems replies get out of sync in this case, I wondered if we might be able to use a @mismatch handler but the docs suggest it will reparse the original input rather than let us try and read more. I think the best option here might be to have an @mismatch handler that just does a wait, this then gives more chance for the next command to be able to clear the stray message from the input buffer rather than interpret it - it doesn't fox the problem, but would limit it to just one command failing rather than a long sequence of them.

I think stream device should log lock timeouts to file so will submit a PR for that.

FreddieAkeroyd commented 3 years ago

Tests on EMU look to be going well, also now deployed to MUSR

JamesKingWork commented 3 years ago

Confirmed on MUSR that with the increase in locktimeout we no longer see invalid alarms appearing. The soak test on MUSR is still ongoing so will confirm soon whether we see the less frequent logged mismatch errors.