Open janicez opened 5 years ago
Reproduced on 3.5.7.
13:06:53 peri141 -- | /home/ellenor/.local/charybdis-3.5/etc/ircd.conf :Rehashing
13:06:53 peri141 -- | hades.arpa: *** Notice -- Hilde!~ellenor@perihelion.nj.us.umbrellix.net{ellenor2000} is rehashing server config file
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14005: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14005: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14004: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14004: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14003: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14003: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14002: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14002: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14001: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14001: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14000: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14000: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14105: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14105: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14104: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14104: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14103: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14103: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14102: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14102: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14101: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14101: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14100: Address already in use
13:06:53 peri141 -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14100: Address already in use
lsof for the ports ircd opens:
bandb 4085 ellenor 257u IPv4 0xfffffe01f72e9840 0t0 TCP *:14005 (LISTEN)
bandb 4085 ellenor 258u IPv6 0xfffffe01f76d4800 0t0 TCP *:14005 (LISTEN)
bandb 4085 ellenor 259u IPv4 0xfffffe0f2b171040 0t0 TCP *:14004 (LISTEN)
bandb 4085 ellenor 260u IPv6 0xfffffe01fab40880 0t0 TCP *:14004 (LISTEN)
bandb 4085 ellenor 261u IPv4 0xfffffe01f8793800 0t0 TCP *:14003 (LISTEN)
bandb 4085 ellenor 262u IPv6 0xfffffe01f801a040 0t0 TCP *:14003 (LISTEN)
bandb 4085 ellenor 263u IPv4 0xfffffe01ed796800 0t0 TCP *:14002 (LISTEN)
bandb 4085 ellenor 264u IPv6 0xfffffe01ff15b840 0t0 TCP *:14002 (LISTEN)
bandb 4085 ellenor 265u IPv4 0xfffffe01f7e5f000 0t0 TCP *:14001 (LISTEN)
bandb 4085 ellenor 266u IPv6 0xfffffe0f2fafe880 0t0 TCP *:14001 (LISTEN)
bandb 4085 ellenor 267u IPv4 0xfffffe01f72f1880 0t0 TCP *:14000 (LISTEN)
bandb 4085 ellenor 268u IPv6 0xfffffe01e8598100 0t0 TCP *:14000 (LISTEN)
bandb 4085 ellenor 269u IPv4 0xfffffe01f5089800 0t0 TCP *:14105 (LISTEN)
bandb 4085 ellenor 270u IPv6 0xfffffe01f5089080 0t0 TCP *:14105 (LISTEN)
bandb 4085 ellenor 271u IPv4 0xfffffe01f50ab840 0t0 TCP *:14104 (LISTEN)
bandb 4085 ellenor 272u IPv6 0xfffffe01f50ab0c0 0t0 TCP *:14104 (LISTEN)
bandb 4085 ellenor 273u IPv4 0xfffffe0f31051880 0t0 TCP *:14103 (LISTEN)
bandb 4085 ellenor 274u IPv6 0xfffffe01fee657c0 0t0 TCP *:14103 (LISTEN)
bandb 4085 ellenor 275u IPv4 0xfffffe01fee65040 0t0 TCP *:14102 (LISTEN)
bandb 4085 ellenor 276u IPv6 0xfffffe01f87a3840 0t0 TCP *:14102 (LISTEN)
bandb 4085 ellenor 277u IPv4 0xfffffe01f6b4e880 0t0 TCP *:14101 (LISTEN)
bandb 4085 ellenor 278u IPv6 0xfffffe0f2b1717c0 0t0 TCP *:14101 (LISTEN)
bandb 4085 ellenor 279u IPv4 0xfffffe0f31051100 0t0 TCP *:14100 (LISTEN)
bandb 4085 ellenor 280u IPv6 0xfffffe01f5f83780 0t0 TCP *:14100 (LISTEN)
I'd hazard a guess that O_CLOEXEC
isn't being set somewhere (or isn't being respected, if it is). I'll have a look into this, but without a system to test and reproduce on, I can't promise anything.
shall i throw you a shell account on my illumos box?
and yes, I just find|xargs grep'd through the source code of my fork, and O_CLOEXEC is not being set anywhere.
would it be idiomatic ratbox coding to dig into an rb_fde_t, or are those values to be treated as black boxes? pre-publish edit: it appears to be int rb_get_fd() to get the fd out of an F. good-o.
I'm considering adding a hack to my 3.5.7 fork (and possibly pull-req'ing it back to mainline 3.5.7 if it's idiomatic) that will fcntl F_SETFD FD_CLOEXEC listeners and the sockets created off of accept()ing them.
By the way, a lowercase i in illumos is correct title case, because the brand is in lowercase.
By the way, @aaronmdjones, incorporating the fix you suggest seems to work on my illumos system.
$ lsof -i TCP:14100
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
weechat 6267 ellenor 15u IPv4 0xfffffe01feb01780 0t0 TCP perihelion.local:37663->perihelion.local:14100 (ESTABLISHED)
ircd 22066 ellenor 291u IPv4 0xfffffe01f6b4e880 0t0 TCP *:14100 (LISTEN)
ircd 22066 ellenor 292u IPv6 0xfffffe01ea298800 0t0 TCP *:14100 (LISTEN)
ssld 22098 ellenor 7u IPv4 0xfffffe01ee9f2000 0t20712 TCP perihelion.local:14100->perihelion.local:37663 (ESTABLISHED)
$ : I will shortly do a rehash. Once I have, I will show the output of that lsof command again.
$ lsof -i TCP:14100
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
weechat 6267 ellenor 15u IPv4 0xfffffe01feb01780 0t0 TCP perihelion.local:37663->perihelion.local:14100 (ESTABLISHED)
ircd 22066 ellenor 291u IPv4 0xfffffe01f6b4e880 0t0 TCP *:14100 (LISTEN)
ircd 22066 ellenor 292u IPv6 0xfffffe01ea298800 0t0 TCP *:14100 (LISTEN)
ssld 22098 ellenor 7u IPv4 0xfffffe01ee9f2000 0t23529 TCP perihelion.local:14100->perihelion.local:37663 (ESTABLISHED)
I can even still reconnect.
Adding this line seems to be what fixed it:
fcntl (rb_get_fd(listener->F), F_SETFD, fcntl(rb_get_fd(listener->F), F_GETFD, 0) | FD_CLOEXEC);
Curious. The same line doesn't seem to fix it in chary 4.
In addition, if I kill the ircd process (or shut it down gracefully, even), neither bandb nor ssld close down correctly.
So far, this has only been reproduced on a fork of 3.5.7. It will be tested on a clean 3.5.7 work tree, and this bug is not to be considered valid until such time as it has been reproduced on clean 3.5.7.