cloudflare / tableflip

Graceful process restarts in Go
BSD 3-Clause "New" or "Revised" License
2.88k stars 147 forks source link

add function to return all inherited files #68

Closed hunts closed 2 years ago

hunts commented 2 years ago

This is helpful in the use case of integrating with systemd socket activation.

When there is no parent, the app need to scan or use actication.Files() to find all the fds and call upgrader.AddFile() to track them. But when there is a parent, with the help of this new function, we can find the fds from upgrader instance rather then by scanning fds or calling actication.Files() both of which creates a set of os.File representing the underlying fds.

Without this change, there would be two set of os.File objects that represent the same fd set. One set of os.File is hold by the upgrader while the another set returning from fd scanning (or actication.Files()) is being used at other places. This raise a risk of closing the same fd twice at different time which is really bad when the fd number get reused by other files during the two close calls.

lmb commented 2 years ago

Hi Hunts,

If I remember correctly, files are dup()ed before being added to Fds: https://github.com/cloudflare/tableflip/blob/287e7a4b9d84c33ee639869a1dd919f8014b4887/fds.go#L316-L319

This means that you can close the file you passed to Upgrader.Fds.AddFile after the call returns without creating a problem. In pseudo code, I'd probably do something like this:

upg, err := ...

if !upg.HasParent() {
  for _, f := range systemdInerited {
    err := upg.AddFile // upg.AddListener, etc.
    f.Close()  
  }
}

// Continue normal start up
ln, err := upg.Listen("tcp", ...)

Does that solve your problem?

lmb commented 2 years ago

(NB: I don't have merge access to this repository anymore!)

hunts commented 2 years ago

The problem is occurring at this step:

// Continue normal start up
ln, err := upg.Listen("tcp", ...)

We don't have named addresses but a list of fd passed from systemd. If you still remember this piece of code:

/etc/systemd/system/<service-name>.socket:
  file.managed:
{%- for addr in ["127.0.0.1", "[::1]"] %}
  {%- for _ in range(0, 128) %}
          - ListenDatagram: "{{ addr }}:53"
  {%- endfor %}
{%- endfor %}

What we can do is either like:

// Continue normal start up
for _, f := range systemdInerited {
    ln, err := net.FileConn(f) 
    ...
}

Or like:

// Continue normal start up
files, err := upg.Files()
for _, f := range files {
    ln, err := net.FileConn(f) 
    ...
}

The former approach has the problem we I've describe in the PR message. The later one requires the Files() methods we are introducing here.