crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.36k stars 1.62k forks source link

Sockets are inherited by subprocesses #14630

Closed carlhoerberg closed 4 months ago

carlhoerberg commented 4 months ago

Bug Report

require "socket"
s = TCPServer.new("127.1", 1234)
Process.new("sleep", {"99"})
s.close
s = TCPServer.new("127.1", 1234)

Result

Unhandled exception: Could not bind to '127.1:1234': Address already in use (Socket::BindError)

Reason

The first server socket is inherited by the subprocess and hence the port is still in use when the second socket is bound. Just like files are opened with O_CLOEXEC sockets should be opened with the SOCK_CLOEXEC flag.

https://www.man7.org/linux/man-pages/man2/socket.2.html https://www.man7.org/linux/man-pages/man2/open.2.html

Monkey-patch fix

require "socket"

module Crystal::System::Socket
  private def create_handle(family, type, protocol, blocking) : Handle
    {% if LibC.has_constant?(:SOCK_CLOEXEC) %}
      type = type.to_i | LibC::SOCK_CLOEXEC
    {% end %}
    fd = LibC.socket(family, type, protocol)
    raise ::Socket::Error.from_errno("Failed to create socket") if fd == -1
    fd
  end
end
Sija commented 4 months ago

Reduced monkey-patch:

require "socket"

module Crystal::System::Socket
  private def create_handle(family, type, protocol, blocking) : Handle
    {% if LibC.has_constant?(:SOCK_CLOEXEC) %}
      type = type.to_i | LibC::SOCK_CLOEXEC
    {% end %}
    previous_def family, type, protocol, blocking
  end
end
straight-shoota commented 4 months ago

Hm, this is actually a bit weird. There is a method Socket#initialize_handle which is supposed to do this, but only on platforms where LibC::SOCK_CLOEXEC is undefined. I have no idea what's the reason for this😕

https://github.com/crystal-lang/crystal/blob/a30e3d96248a55de4524005837f944294dd14278/src/crystal/system/unix/socket.cr#L17-L23

This code was originally introduced in #1192 as #init_close_on_exec.

I'd like to understand what's the reasoning behind this. It feels like this is handling the case where SOCK_CLOEXEC is undefined, but the opposite case is missing.

carlhoerberg commented 4 months ago

So you can always do the fcntl thing, but that introduces a race condition, that's why SOCK_CLOEXE directly to socket was introduced in linux/bsd etc, but it was never implemented in darwin. My thinking is that the PR author knew this, but just forgot/accidelty deleted the line that is now suggested in this PR.

ysbaddaden commented 4 months ago

It's also to make it in one syscall, when possible, instead of two.

That was completely overlooked. Searching the history doesn't bring much information, except that I participated in missing the whole thing (oops). File descriptors should always have CLOEXEC set by default (that, I remember).

straight-shoota commented 4 months ago

I just verified this seems to work correctly for FileDescriptor implementations (Crystal::System::File.open, Crystal::System::FileDescriptor.pipe and #system_reopen).