fermi-ad / acsys-python

Python module to access the Fermilab Control System
MIT License
8 stars 4 forks source link

DPM data logger closes connection after retrieving only portion of entries requested #11

Closed kjhazelwood closed 3 years ago

kjhazelwood commented 3 years ago

When retrieving data for multiple devices from one common logger node, the DPM returns data for a portion of the devices and then without any explanation throws a 1-34 ACNET DISCONNECTED error and fails. @beauremus and @charlieking65 modified DPM02 to add a 20mS delay between responses, after which I was able to retrieve all the desired data all be it slower.

device count: 21 device name format: I:L123{0..2}, I:l1232{B,C}, I:L130{1..4}, I:L130{5,6}{A,B}, I:L13{07..14} logger node: Minj2 logger event: P,1000,TRUE start date: 2021-01-01 00:00:00 end date: 2021-02-01 00:00:00

edit: Use glob syntax 😉, check me on this @kjhazelwood - beau

beauremus commented 3 years ago

I was wrong about the delay, it's 10ms. We went through several iterations to determine what was most reliable without being too slow.

This solution has only been deployed only on DPM02 but because of #13, we need to make sure we aren't introducing a new problem.

I want to note that @kjhazelwood and I were able to make this request successfully multiple times yesterday.

Clearly, this manual delay is not desirable. The long-term solution is to have a TCP connection between the DPM and the client. Currently, we use ACNET/UDP to communicate between DPM and the TCP proxy. We believe that this is where packets are being dropped. Introducing the delay is enough time for the queues to flush without dropping packets.

beauremus commented 3 years ago

We have alleviated this issue by rearchitecting the infrastructure on-site. The final solution is to have a TCP connection all the way to DPM, see #20.