epics-base / pvxs

PVA protocol client/server library and utilities.
https://mdavidsaver.github.io/pvxs/
Other
21 stars 31 forks source link

Intermittent failure of `testsock` #23

Open ralphlange opened 3 years ago

ralphlange commented 3 years ago

Description In local builds on a VM, some tests fail consistently:

testsock.t .....
not ok  5 - Recv'd -1(11) [0, 0, 0, 0]
not ok  6 - src (<>) == send_addr (127.0.0.1:35007)
not ok  8 - Recv'd -1 [0, 0, 0, 0]
not ok  9 - src (<>) == sender_addr (127.0.0.1:52034)
Dubious, test returned 1 (wstat 256, 0x100)
Failed 4/33 subtests

Information:

ralphlange commented 3 years ago

It's not consistent, I have just seen all tests pass. (Just running them again.) Flap, flap.

mdavidsaver commented 3 years ago

This is one of two (and only two!) spurious test failures I see with PVXS. Both seemingly related to apparent winsock specific synchronization oddities. This test (test_udp()) setups up two UDP sockets with one thread and uses one to send a packet to the other. It appears that, even though bind() has succeeded, sometimes the socket buffer for the second isn't ready by the time sendto() is called on the first.

The other failure I sometimes see originates with the libevent compatibility version of socketpair(), which as I think about it now is doing something similar with two TCP sockets on one thread.

ralphlange commented 1 year ago

Still there, with PVXS 1.0.0 and EPICS Base 7.0.7 on RHEL 8.5

mdavidsaver commented 1 year ago

I think that the core (apparently incorrect) assumption I make in testsock is that the RX buffering behind a UDP socket is 100% ready after an apparently successful bind() and maybe a IP_ADD_MEMBERSHIP. So eg. a sequence bind(), sendto(), and recvfrom() can proceed without blocking.

mdavidsaver commented 1 year ago

Attempting a fix with 5897fe273e439186c396ed2a85ef2dddb3c4d89e. I can't reliably trigger the failure, so I don't know if this will be sufficient.

mdavidsaver commented 1 year ago

Well, now a different error. Seems to be less frequent than the previous ones.

  testsock.tap ..... 
  not ok 39 -  ret<0 RX3 expected error ret=14 err=11