epics-modules / measComp

EPICS support for some USB and Ethernet I/O modules from Measurement Computing
6 stars 16 forks source link

USB-CTR fails in MCS mode with dwell less than 1 ms on UL for Linux #16

Closed MarkRivers closed 2 years ago

MarkRivers commented 2 years ago

USB-CTR fails in MCS mode with dwell time less than 1 ms on UL for Linux. The count does not end and the data is incorrect.

MarkRivers commented 2 years ago

This is a brief restatement of the problem:

When the RATE > 1000 the driver changes to SO_BLOCKIO, i.e. larger USB packets containing multiple readings. In this mode it works fine the first time the program is run. The second and subsequent runs fail because the first USB packet from the device is never received. The first USB packet that is received actually should be the second packet. This means that the total number of packets received is one fewer than it should be. We are missing the data from the first USB packet and the scan never completes.

I tried several things to fix this:

MarkRivers commented 2 years ago

I added the following code after line 315 in DaqIUsbBase.cpp. It prints the first 9 values in each USB buffer received, after they are converted to double.

 if (numOfSampleCopied < 9)
    printf("DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=%llu, data=%f\n", 
           mScanInfo.currentDataBufferIdx, dataBuf[mScanInfo.currentDataBufferIdx]);

This is what I see the first time I run the program, when it works correctly. Note that values 0-7 are all zero in the first buffer. In the second buffer (index starting at 128) there are non-zero values (14, 1499, etc.)

TahoeU18:/corvette/home/epics/devel/measComp/measCompApp/src> ../../bin/linux-x86_64-ub18/test_USB_CTR_Linux
Found 3 DAQ device(s)
Running on Linux, RATE=1001.000000
Connecting to device type=USB-CTR08 serial number=01E538A2
(This is the first USB buffer received)
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=0, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=1, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=2, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=3, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=4, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=5, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=6, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=7, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=8, data=1.000000
(This is the second USB buffer)
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=128, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=129, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=130, data=14.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=131, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=132, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=133, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=134, data=1.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=135, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=136, data=1499.000000

This is what I see the second time I run the program. Note that we never receive the first buffer with all zeros. Instead the first buffer that is received is actually the second buffer above.

TahoeU18:/corvette/home/epics/devel/measComp/measCompApp/src> ../../bin/linux-x86_64-ub18/test_USB_CTR_Linux
Found 3 DAQ device(s)
Running on Linux, RATE=1001.000000
Connecting to device type=USB-CTR08 serial number=01E538A2
(This is the first  USB buffer received)
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=0, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=1, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=2, data=14.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=3, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=4, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=5, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=6, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=7, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=8, data=1498.000000
(This is the second USB buffer).
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=128, data=28.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=129, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=130, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=131, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=132, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=133, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=134, data=2897.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=135, data=0.000000
DaqIUsbBase::processScanData32_dbl currentDataBufferIdx=136, data=0.000000

The problem is clear: the second time the program is run the first USB buffer is never received. This causes the counter count and index never to reach the final values. It also causes the counts to be in the wrong location in the scan buffer, since each USB buffer does not contain an integer number of scan points.

I do see that the last packet sent from the device is smaller (160 bytes) than all the other packets (512 bytes), and it does contain the expected counts after 1000 scan points. So the device has correctly terminated the scan and sent the last packet.

I have added usleep() calls at places that I thought they might be needed, but that does not fix the problem. I think I am going to need help from the Measurement Computer engineering team to figure out the problem.

However, they currently only have a Raspberry Pi for testing on Linux, and the problem does not occur there.

MarkRivers commented 2 years ago

This problem was only present on Ubuntu 18. It was not present on Centos 7, Centos 8, or Mint 20.3. Updating to Ubuntu 20 eliminated the problem.