Closed mdavidsaver closed 3 weeks ago
The OSX build CI failures will be resolved by https://github.com/epics-base/setuptools_dso/pull/35
It looks like this is only for stream sockets and TCP, so if no server gets port 5075 that won't prevent UDP searche packets from being received and distributed to other servers via the localhost loopback. Is that correct?
Have you given any thought to how much work might be needed for a server to accept its sockets from inetd (stdin/stdout) or systemd via sd_listen_fds()
? That might be a useful option for embedded servers, although socket activation seems less likely to make sense for IOCs at least.
It looks like this is only for stream sockets and TCP ...
Correct. As I understand it, this laziness of bind()
is specific to TCP sockets where the REUSEADDR is set (and so specific to *nix). eg.
S1=socket(AF_INET, SOCK_STREAM)
S2=socket(AF_INET, SOCK_STREAM)
S1.bind(('127.0.0.1', 5000))
S2.bind(('127.0.0.1', 5000)) # fails! (EADDRINUSE)
S1=socket(AF_INET, SOCK_STREAM)
S2=socket(AF_INET, SOCK_STREAM)
S1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
S2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
S1.bind(('127.0.0.1', 5000))
S2.bind(('127.0.0.1', 5000)) # succeeds!
S1.listen(4)
S2.listen(4) # fails! (EADDRINUSE)
Have you given any thought to ...
sd_listen_fds()
?
No, not really. It seems like a lot of work for not much benefit, with a high probability of mis-configured .socket files (plural!) causing chaos.
What I have thought about is calling sd_notify()
with IOC lifecycle changes. Primarily using initHookAfterIocRunning
to emit READY=1
, so that dependent units don't race CA/PVA server startup.
... of course usage of sd_*
is mote so long as procServ is involved.
fyi. my attempt at provoking this race was not successful. I guess a shell loop is too slow with so many fork()
s.
cat > tick.db << EOF
record(calc, "$(P=)cnt") {
field(INPA, "$(P=)cnt")
field(CALC, "A+1")
field(SCAN, "1 second")
}
EOF
for n in `seq 1 100`; do sh -c "softIocPVX -m P=$n: -d tick.db -S </dev/null &" ; done
Followed by
for n in `seq 1 100`; do echo $n:cnt; done | xargs pvxget
Will complete without timeout if all PVA servers started.
cleanup
killall softIocPVX
fyi. my attempt at provoking this race was not successful ...
It can, sometimes, eventually. Looping through iocBomb.sh gets a timeout on one or two PVs within a couple of minutes on my laptop without this PR. With this PR applied, I eventually got bored.
while sh iocBomb.sh; do date; done
I am satisfied with this result.
Attempts to address #81.
On Linux (at least) SO_REUSEADDR, which allows a new listener to bind while an existing sock is in FIN-WAIT. Apparently this allows any number of sockets to bind(), but only when listen() to succeed.
Further, on Linux there is a known documented race condition which can result in all listen() failing. It isn't clear how to handle this case without a potentially infinite loop, so ignore it. If this happens, then eg. no PVA server will get port 5075.
So when probing for another listener, it is necessary to enter the listening state. When this fails, the socket is no longer usable for another bind(), so it is necessary to allocate another for the next attempt.