Open coretl opened 2 months ago
There seems to be a state where the flag data_capture_enabled
is false
but the data_buffer
is still active. This buffer has three states: STATE_IDLE
, STATE_ACTIVE
, and STATE_CLEARING
, the last being used when a data capture from hardware is complete but data is still be transferred to a connected data user. The function read_buffer_status
returns false
when the buffer is in STATE_IDLE
and true
otherwise.
Hmm. I think this is probably a bug in the return value from read_buffer_status
, as the next line after the failing test is testing the number of still connected clients. This function is only used twice, the second call is just a few lines further down used to implement *PCAP.STATUS?
: https://github.com/PandABlocks/PandABlocks-server/blob/ee4c35271426aa949b2b7001edd367687f7524f6/server/data_server.c#L269-L275
I think the simplest fix is just
diff --git a/server/data_server.c b/server/data_server.c
index 37bba82..54cb2bc 100644
--- a/server/data_server.c
+++ b/server/data_server.c
@@ -251,7 +251,7 @@ error__t arm_capture(void)
"Data capture already in progress") ?:
/* If data capture is not enabled then we can safely expect the
* buffer status to be idle. */
- TEST_OK(!read_buffer_status(data_buffer, &readers, &active)) ?:
+ DO(read_buffer_status(data_buffer, &readers, &active)) ?:
TEST_OK_(active == 0, "Data clients still taking data") ?:
/* Get PCAP ARM timestamp and store as a global */
TEST_IO(clock_gettime(CLOCK_REALTIME, &pcap_arm_ts)) ?:
A more thorough fix might involve exposing all three states from this function, decoding them properly in get_capture_status
, and expecting not STATE_ACTIVE
in the read buffer call above, but I don't think this is worth doing.
In brief, I think your test should be failing with the message ERR: Data clients still taking data
instead.
Somehow we hit this
TEST_OK
while doing*PCAP.ARM
: https://github.com/PandABlocks/PandABlocks-server/blob/1ffcc0a1803a38a22fb948ba32b7b75debe558a7/server/data_server.c#L220-L222Causing
ERR: Unexpected error at .../data_server.c:220
. This was the ID panda that @glennchid upgraded, so I guess this was the 3.0 release, rather than @EmilioPeJu's latest changes, although I can't see anything in them that touches this code.There are 2 parts of the code that connect to a PandA:
Inserting a 1 second sleep between the two fixes the problem. Inserting a 0.1s sleep does not.
I have yet to reproduce this outside the application they were using, but before I do, do you know what might trigger this?