ISISComputingGroup / IBEX

Top level repository for IBEX stories
5 stars 2 forks source link

IOC: Separator produces too many logs when module incorrectly named #3726

Open John-Holt-Tessella opened 5 years ago

John-Holt-Tessella commented 5 years ago

The Muon separator IOC can generate 5GB of logs a day in an error condition. Log was

[2018-11-05 16:37:13] 2018/11/05 16:37:09.786 ### DAQmx ERROR (CreateAI): Device
 identifier is invalid.
[2018-11-05 16:37:13] Device Specified: cDAQ9185-MUONFEMod3
[2018-11-05 16:37:13] Suggested Device(s): cDAQ9185-MUONFE
[2018-11-05 16:37:13]
[2018-11-05 16:37:13] Task Name: R0
[2018-11-05 16:37:13]
[2018-11-05 16:37:13] Status Code: -200220

this is repeated more than 30 times a second.

-I am not sure whether this is because it was disconnected or whether it is because there was an error in the module configuration.- The error was caused by a module configuration problem where the name was mis-spelt. This is likely at the DAQMX layer.

Tom-Willemsen commented 5 years ago

Would it be possible to catch this type of issue via nagios before we fill up the disk and trigger the main disk space warning?

Since the logs rotate daily, a check on whether any file in the ioc logs directory exceeds a certain size (e.g. 100MB) would be sufficient...

kjwoodsISIS commented 5 years ago

I can see no reason why the Separator IOC needs to log a connection failure 30 times a second. We should get it fixed so that we don't have to configure nagios to work around the problem.

ghost commented 5 years ago

Here is a transcript when you disconnect the DAQ in the office (pull out the ethernet cable) when the separator IOC is running:

2018/11/07 10:21:08.497 ### DAQmx ERROR (ReadAnalogF64): Some or all of the samples requested have not yet been acquired.

To wait for the samples to become available use a longer read timeout or read later in your program. To make the samples available sooner, increase the sample rate. If your task uses a start trigger,  make sure that your start trigger is configured correctly. It is also possible that you configured the task for external timing, and no clock was supplied. If this is the case, supply an external clock.
Property: DAQmx_Read_RelativeTo
Correspon
2018/11/07 10:21:21.251 ### DAQmx ERROR (StopTask): Some or all of the samples requested have not yet been acquired.

To wait for the samples to become available use a longer read timeout or read later in your program. To make the samples available sooner, increase the sample rate. If your task uses a start trigger,  make sure that your start trigger is configured correctly. It is also possible that you configured the task for external timing, and no clock was supplied. If this is the case, supply an external clock.
Property: DAQmx_Read_RelativeTo
Correspon

No more error messages were created.

ghost commented 5 years ago

Here is a transcript when you start an IOC with the DAQ in the office is disconnected:


epics>
epics>
epics> 2018/11/07 10:24:52.637 ### DAQmx ERROR (StartTask): Retrieving properties from the network device failed. Make sure the device is connected.
Device Specified: cDAQ9185-R3G39
Property: DAQmx_Dev_TCPIP_EthernetIP
Corresponding Value: 130.246.50.212
Property: DAQmx_Dev_TCPIP_Hostname
Corresponding Value: cDAQ9185-R3G39

Device: cDAQ9185-R3G39

Task Name: R0

Status Code: -201401