charybdis-ircd / charybdis

Scalable IRCv3.2 server for large, community-oriented networks
GNU General Public License v2.0
231 stars 102 forks source link

IRCd listening sockets are erroneously inherited by libratbox helper processes (e.g. bandb) on illumos (SunOS 5.11) #291

Open janicez opened 5 years ago

janicez commented 5 years ago

So far, this has only been reproduced on a fork of 3.5.7. It will be tested on a clean 3.5.7 work tree, and this bug is not to be considered valid until such time as it has been reproduced on clean 3.5.7.

janicez commented 5 years ago

Reproduced on 3.5.7.

13:06:53 peri141  -- | /home/ellenor/.local/charybdis-3.5/etc/ircd.conf :Rehashing
13:06:53 peri141  -- | hades.arpa: *** Notice -- Hilde!~ellenor@perihelion.nj.us.umbrellix.net{ellenor2000} is rehashing server config file
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14005: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14005: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14004: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14004: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14003: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14003: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14002: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14002: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14001: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14001: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14000: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14000: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14105: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14105: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14104: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14104: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14103: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14103: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14102: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14102: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14101: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14101: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14100: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14100: Address already in use

lsof for the ports ircd opens:

bandb    4085 ellenor  257u  IPv4 0xfffffe01f72e9840      0t0  TCP *:14005 (LISTEN)
bandb    4085 ellenor  258u  IPv6 0xfffffe01f76d4800      0t0  TCP *:14005 (LISTEN)
bandb    4085 ellenor  259u  IPv4 0xfffffe0f2b171040      0t0  TCP *:14004 (LISTEN)
bandb    4085 ellenor  260u  IPv6 0xfffffe01fab40880      0t0  TCP *:14004 (LISTEN)
bandb    4085 ellenor  261u  IPv4 0xfffffe01f8793800      0t0  TCP *:14003 (LISTEN)
bandb    4085 ellenor  262u  IPv6 0xfffffe01f801a040      0t0  TCP *:14003 (LISTEN)
bandb    4085 ellenor  263u  IPv4 0xfffffe01ed796800      0t0  TCP *:14002 (LISTEN)
bandb    4085 ellenor  264u  IPv6 0xfffffe01ff15b840      0t0  TCP *:14002 (LISTEN)
bandb    4085 ellenor  265u  IPv4 0xfffffe01f7e5f000      0t0  TCP *:14001 (LISTEN)
bandb    4085 ellenor  266u  IPv6 0xfffffe0f2fafe880      0t0  TCP *:14001 (LISTEN)
bandb    4085 ellenor  267u  IPv4 0xfffffe01f72f1880      0t0  TCP *:14000 (LISTEN)
bandb    4085 ellenor  268u  IPv6 0xfffffe01e8598100      0t0  TCP *:14000 (LISTEN)
bandb    4085 ellenor  269u  IPv4 0xfffffe01f5089800      0t0  TCP *:14105 (LISTEN)
bandb    4085 ellenor  270u  IPv6 0xfffffe01f5089080      0t0  TCP *:14105 (LISTEN)
bandb    4085 ellenor  271u  IPv4 0xfffffe01f50ab840      0t0  TCP *:14104 (LISTEN)
bandb    4085 ellenor  272u  IPv6 0xfffffe01f50ab0c0      0t0  TCP *:14104 (LISTEN)
bandb    4085 ellenor  273u  IPv4 0xfffffe0f31051880      0t0  TCP *:14103 (LISTEN)
bandb    4085 ellenor  274u  IPv6 0xfffffe01fee657c0      0t0  TCP *:14103 (LISTEN)
bandb    4085 ellenor  275u  IPv4 0xfffffe01fee65040      0t0  TCP *:14102 (LISTEN)
bandb    4085 ellenor  276u  IPv6 0xfffffe01f87a3840      0t0  TCP *:14102 (LISTEN)
bandb    4085 ellenor  277u  IPv4 0xfffffe01f6b4e880      0t0  TCP *:14101 (LISTEN)
bandb    4085 ellenor  278u  IPv6 0xfffffe0f2b1717c0      0t0  TCP *:14101 (LISTEN)
bandb    4085 ellenor  279u  IPv4 0xfffffe0f31051100      0t0  TCP *:14100 (LISTEN)
bandb    4085 ellenor  280u  IPv6 0xfffffe01f5f83780      0t0  TCP *:14100 (LISTEN)
aaronmdjones commented 5 years ago

I'd hazard a guess that O_CLOEXEC isn't being set somewhere (or isn't being respected, if it is). I'll have a look into this, but without a system to test and reproduce on, I can't promise anything.

janicez commented 5 years ago

shall i throw you a shell account on my illumos box?

janicez commented 5 years ago

and yes, I just find|xargs grep'd through the source code of my fork, and O_CLOEXEC is not being set anywhere.

janicez commented 5 years ago

would it be idiomatic ratbox coding to dig into an rb_fde_t, or are those values to be treated as black boxes? pre-publish edit: it appears to be int rb_get_fd() to get the fd out of an F. good-o.

I'm considering adding a hack to my 3.5.7 fork (and possibly pull-req'ing it back to mainline 3.5.7 if it's idiomatic) that will fcntl F_SETFD FD_CLOEXEC listeners and the sockets created off of accept()ing them.

janicez commented 5 years ago

By the way, a lowercase i in illumos is correct title case, because the brand is in lowercase.

janicez commented 5 years ago

By the way, @aaronmdjones, incorporating the fix you suggest seems to work on my illumos system.

 $ lsof -i TCP:14100                                                                       
COMMAND   PID    USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
weechat  6267 ellenor   15u  IPv4 0xfffffe01feb01780      0t0  TCP perihelion.local:37663->perihelion.local:14100 (ESTABLISHED)
ircd    22066 ellenor  291u  IPv4 0xfffffe01f6b4e880      0t0  TCP *:14100 (LISTEN)
ircd    22066 ellenor  292u  IPv6 0xfffffe01ea298800      0t0  TCP *:14100 (LISTEN)
ssld    22098 ellenor    7u  IPv4 0xfffffe01ee9f2000  0t20712  TCP perihelion.local:14100->perihelion.local:37663 (ESTABLISHED)
 $ : I will shortly do a rehash. Once I have, I will show the output of that lsof command again.
 $ lsof -i TCP:14100                                                                            
COMMAND   PID    USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
weechat  6267 ellenor   15u  IPv4 0xfffffe01feb01780      0t0  TCP perihelion.local:37663->perihelion.local:14100 (ESTABLISHED)
ircd    22066 ellenor  291u  IPv4 0xfffffe01f6b4e880      0t0  TCP *:14100 (LISTEN)
ircd    22066 ellenor  292u  IPv6 0xfffffe01ea298800      0t0  TCP *:14100 (LISTEN)
ssld    22098 ellenor    7u  IPv4 0xfffffe01ee9f2000  0t23529  TCP perihelion.local:14100->perihelion.local:37663 (ESTABLISHED)

I can even still reconnect.

Adding this line seems to be what fixed it:

fcntl (rb_get_fd(listener->F), F_SETFD, fcntl(rb_get_fd(listener->F), F_GETFD, 0) | FD_CLOEXEC);
janicez commented 5 years ago

Curious. The same line doesn't seem to fix it in chary 4.

In addition, if I kill the ircd process (or shut it down gracefully, even), neither bandb nor ssld close down correctly.