containerd / nri

Node Resource Interface
Apache License 2.0
238 stars 62 forks source link

socketpair_unix: avoid double close(), set FD_CLOEXEC #66

Closed champtar closed 8 months ago

champtar commented 8 months ago

We were calling the close() syscall multiple time with the same fd number, leading to random issues like closing containerd stream port.

In newLaunchedPlugin() we have:

sockets, _ := net.NewSocketPair()
defer sockets.Close()
conn, _ := sockets.LocalConn()
peerFile := sockets.PeerFile()
defer func() {
  peerFile.Close()
  if retErr != nil {
    conn.Close()
  }
}()
cmd.Start()

so we were doing:

close(local) (in LocalConn())
cmd.Start()
close(peer) (peerFile.Close())
close(local) (sockets.Close())
close(peer) (sockets.Close())

If the NRI plugin that we launch with cmd.Start() is not cached or the system is a bit busy, cmd.Start() gives a large enough window for another goroutine to open a file that will get the same fd number as local was, thus being closed by accident.

Fix the situation by storing os.Files instead of ints, so that closing multiple times just returns an error.

Also set FD_CLOEXEC on the sockets so we don't leak them.

Fixes 1da2cdfebfcecfb6cb0c93fc20fa3822d918f00b

champtar commented 8 months ago

ping @klihub (initial author)

klihub commented 8 months ago

not tested with containerd / windows version also needs fixing

Yeah, we haven't really had the time to impement and test NRI for Windows, yet. That said, on Windows, SocketPair emulates a socketpair using two ends of a pre-connected TCPv4 socket, which are then stored as sys.Handle's. Shouldn't that protect us from a similar double-close error ?

champtar commented 8 months ago

@klihub just pushed a version fixing windows and moving all the common functions in socketpair.go (windows part not tested)

fuweid commented 8 months ago

Hi @champtar would you please file a ticket to track window part and focus on linux part in this pull request? I don't have too much knowledge on windows platform actually. we can ask other maintainers to take over this. Thanks

klihub commented 8 months ago

Hi @champtar would you please file a ticket to track window part and focus on linux part in this pull request? I don't have too much knowledge on windows platform actually. we can ask other maintainers to take over this. Thanks

+1 I would also leave Windows out of this now and focus on fixing the bug on un*x, since we anyway don't have all the necessary functionality in place for Windows. Sorry if my earlier question/comment was misleading and could be interpreted otherwise.

codecov-commenter commented 8 months ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (6f5a4d2) 64.50% compared to head (7f878a1) 64.58%.

:exclamation: Current head 7f878a1 differs from pull request most recent head 5d0b52b. Consider uploading reports for the commit 5d0b52b to get more accurate results

Files Patch % Lines
pkg/net/socketpair_unix.go 87.50% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #66 +/- ## ========================================== + Coverage 64.50% 64.58% +0.07% ========================================== Files 9 10 +1 Lines 1834 1838 +4 ========================================== + Hits 1183 1187 +4 Misses 500 500 Partials 151 151 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

champtar commented 8 months ago

Pushed back the unix only changes, I'll open a second PR once this one is merged

champtar commented 8 months ago

@klihub hopefully final version