epics-modules / dxp

EPICS support for digital x-ray spectroscopy electronics from X-ray Instrumentation Associates (XIA)
2 stars 3 forks source link

mca1 stucks in Acquiring in a 4 element XMAP system #4

Open xiaoqiangwang opened 3 years ago

xiaoqiangwang commented 3 years ago

I have a 4 element detector XMAP system running on Windows 10, with the following software

The PresetMode = "No preset" and the following Python script starts and stops the acquisition. It happens reproducibly mca1.ACQG stays "Acquiring" after running the script for ~50 minutes. All mca2...4 are "Done".

And command asynReport 2 DXP1 shows that parameter NDDxpAcquiring and _MCAACQURING are 0 for all addresses.

Parameter 75 type=asynInt32, name=NDDxpAcquiring, value=0, status=0 Parameter 338 type=asynInt32, name=MCA_ACQUIRING, value=0, status=0

To me it seems that mca1 failed to read the final status and stuck. It can be recovered by processing $(P)StatusAllOnce manually.

The testing script is as following.

import logging
import time

from epicsPV import epicsPV

prefix = 'X07MB-XMAP:'
eraseStart = epicsPV(prefix + 'EraseStart')
stopAll = epicsPV(prefix + 'StopAll')
mcaTime = epicsPV(prefix + 'mca2.STIM')
mcaTime.setMonitor()
mca1Acquire = epicsPV(prefix + 'mca1.ACQG')
mca1Acquire.setMonitor()
mca2Acquire = epicsPV(prefix + 'mca2.ACQG')
mca2Acquire.setMonitor()
mca3Acquire = epicsPV(prefix + 'mca3.ACQG')
mca3Acquire.setMonitor()
mca4Acquire = epicsPV(prefix + 'mca4.ACQG')
mca4Acquire.setMonitor()

logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)

while True:
    eraseStart.putw(1)
    time.sleep(2)
    logging.info('StopAll')
    stopAll.putWait(1)
    while mca1Acquire.getValue() or mca2Acquire.getValue() or mca3Acquire.getValue() or mca4Acquire.getValue():
        time.sleep(0.075)
    timestamp = mcaTime.getValue()
    logging.info(timestamp)
xiaoqiangwang commented 3 years ago

I dumped the asyn Trace flow and here is what happened before it is stuck.

2020/11/05 15:49:37.778 devMcaAsyn::send_msg: X07MB-XMAP:mca1 command=4, pact=0, rdns=0, rdng=0
2020/11/05 15:49:37.779 DXP1 addr 0 queueRequest priority 0 not lockHolder
2020/11/05 15:49:37.779 asynManager::portThread port=DXP1 callback
2020/11/05 15:49:37.779 devMcaAsyn::asynCallback: X07MB-XMAP:mca1 command=4, ivalue=1, dvalue=0.000000
2020/11/05 15:49:37.780 NDDxp:writeInt32: [DXP1]: function=325 value=0 addr=0 channel=0
2020/11/05 15:49:37.780 NDDxp::getAcquisitionStatistics addr=0 channel=0
2020/11/05 15:49:37.780 NDDxp:getModuleStatistics: enter addr=0
2020/11/05 15:49:37.781 NDDxp:getModuleStatistics: exit
2020/11/05 15:49:37.781 NDDxp:getAcquisitionStatistics: exit
2020/11/05 15:49:37.781 NDDxp:writeInt32: exit
2020/11/05 15:49:37.781 devMcaAsyn::asynCallback: X07MB-XMAP:mca1 command=4, acquiring=1
2020/11/05 15:49:37.781 devMcaAsyn::send_msg: X07MB-XMAP:mca1 command=4, pact=1, rdns=1, rdng=0
2020/11/05 15:49:37.781 devMcaAsyn::send_msg, record=X07MB-XMAP:mca1, elapsed real=2.071439, elapsed live=4.066135, dwell time=1.000000, acqg=1
2020/11/05 15:49:37.782 devMcaAsyn::send_msg: X07MB-XMAP:mca1 command=0, pact=1, rdns=0, rdng=0
2020/11/05 15:49:37.782 DXP1 addr 0 queueRequest priority 0 not lockHolder
2020/11/05 15:49:37.798 NDDxp::acquisitionTask Detected acquisition stop! Now reading statistics
2020/11/05 15:49:37.799 NDDxp::getAcquisitionStatistics addr=-1 channel=-1
2020/11/05 15:49:37.799 NDDxp::getAcquisitionStatistics addr=0 channel=0
2020/11/05 15:49:37.799 NDDxp:getModuleStatistics: enter addr=0
2020/11/05 15:49:37.800 NDDxp:getModuleStatistics: exit
2020/11/05 15:49:37.800 NDDxp:getAcquisitionStatistics: exit
2020/11/05 15:49:37.800 NDDxp::getAcquisitionStatistics addr=1 channel=1
2020/11/05 15:49:37.800 NDDxp:getAcquisitionStatistics: exit
2020/11/05 15:49:37.800 NDDxp::getAcquisitionStatistics addr=2 channel=2
2020/11/05 15:49:37.800 NDDxp:getAcquisitionStatistics: exit
2020/11/05 15:49:37.800 NDDxp::getAcquisitionStatistics addr=3 channel=3
2020/11/05 15:49:37.800 NDDxp:getAcquisitionStatistics: exit
2020/11/05 15:49:37.800 NDDxp:getAcquisitionStatistics: exit
2020/11/05 15:49:37.800 NDDxp:getMcaData: enter addr=-1
2020/11/05 15:49:37.800 NDDxp:getMcaData: enter addr=0
2020/11/05 15:49:37.801 NDDxp:getMcaData: exit
2020/11/05 15:49:37.801 NDDxp:getMcaData: enter addr=1
2020/11/05 15:49:37.802 NDDxp:getMcaData: exit
2020/11/05 15:49:37.802 NDDxp:getMcaData: enter addr=2
2020/11/05 15:49:37.803 NDDxp:getMcaData: exit
2020/11/05 15:49:37.804 NDDxp:getMcaData: enter addr=3
2020/11/05 15:49:37.805 NDDxp:getMcaData: exit
2020/11/05 15:49:37.805 NDDxp:getMcaData: exit
2020/11/05 15:49:37.805 NDDxp:getSCAData: enter addr=-1
2020/11/05 15:49:37.805 NDDxp:getSCAData: enter addr=0
2020/11/05 15:49:37.806 NDDxp:getSCAData: exit
2020/11/05 15:49:37.806 NDDxp:getSCAData: enter addr=1
2020/11/05 15:49:37.807 NDDxp:getSCAData: exit
2020/11/05 15:49:37.807 NDDxp:getSCAData: enter addr=2
2020/11/05 15:49:37.808 NDDxp:getSCAData: exit
2020/11/05 15:49:37.808 NDDxp:getSCAData: enter addr=3
2020/11/05 15:49:37.809 NDDxp:getSCAData: exit
2020/11/05 15:49:37.809 NDDxp:getSCAData: exit
2020/11/05 15:49:37.812 NDDxp:acquisitionTask Waiting for acquisition to start!
2020/11/05 15:49:37.819 asynManager::portThread port=DXP1 callback
2020/11/05 15:49:37.819 devMcaAsyn::asynCallback: X07MB-XMAP:mca1 command=0, ivalue=0, dvalue=0.000000
2020/11/05 15:49:37.819 NDDxp::readInt32Array addr=0 channel=0 function=321
2020/11/05 15:49:37.819 NDDxp:readInt32Array: exit
2020/11/05 15:49:37.819 devMcaAsyn::read_value, record=X07MB-XMAP:mca1, nord=2048

Compare it with other successful acquisitions, the only difference I see is that the acquisitionTask stopoed before command=0(mcaData) callback could happen.

I wonder, as a workaround, wether one can force reading the status the same way as reading the data when the acquisition finishes. https://github.com/epics-modules/dxp/blob/b552b1ce47502511694f78c2e35d5e651a69809a/dxpApp/src/dxpMED.st#L660-L665

MarkRivers commented 3 years ago

I received the following message from Niamh Dougan at Diamond today:

One of our beamlines is having trouble running large scans using an XMAP setup (single channel connected to a detector). The IOC is using asyn 4-41, dxp 6-0 to do the acquisition and mca 7-8 to read the data from the channel. They use the MCA Spectra CollectMode to run multiple acquisitions, but during the scans the mca record stops updating. The point at which this happens is different each time, varying from minutes to hours into the scan. The rest of the dxp control works; the asyn PVs still get processed and updated. The acquisitionTask() method appears to continue working as expected when tracing the asyn messages, including during the call to getMcaData(). The .STIM field of the mca record gets stuck at the last time the record updated, even if the user writes a 1 to the .READ field. Doing so appears to process the record as there is feedback via camonitor on the .VAL, .READ and .RDNG fields. The .ACQG PV is also stuck on Acquire. There doesn't seem to be a way to revive the mca record - a scan can be restarted and still not have this record working as expected. It does eventually come back on its own after a while.

It sounds like this might be the same issue?

I don't have an xMAP system running any more, since we now use the Xspress3, so it is hard for me to test.

My first question is whether this is a new problem? Have PSI and DLS been running this software without problems until recently?

xiaoqiangwang commented 3 years ago

In our case it is the result of an upgrade from Windows 7 to 10, with many changes

The problem isn't entirely new, but happens less often on the Windows 7 PC.