golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.05k stars 17.54k forks source link

io: Copy is easy to misuse and leak goroutines blocked on reads #58628

Open cpuguy83 opened 1 year ago

cpuguy83 commented 1 year ago

What version of Go are you using (go version)?

All

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

N/A

What did you do?

Because of the way io.Copy works (and really any other solution to the same need) it is very easy to wedge a goroutine, and probably an fd along with it.

Example:

pr1, _ := io.Pipe()
_, pw2 := io.Pipe()
go io.Copy(pw2, pr1)

pw2.Close()

In this case, the goroutine will leak once pw2 is closed because there is no way to cancel that copy without also closing pr1. In some cases this makes sense, especially if you control both pipes.

In docker/moby this issue is a common case where we have a client connected over a unix or TCP socket, it is attached to a container's stdio streams. If the client disconnects there is no mechanism (aside from platform specific code like epoll) to get notified that the copy operation needs to be cancelled.

I suggest an interface be added, potentially to io to receive close notifications from the runtime. Example:

type CloseNotifier() {
    CloseNotify() <-chan struct{}
}

The actual interface could be something like (maybe something that returns a context that gets cancelled?). I'm not married to the above interface (I know there's a similarly named one in net/http).

*os.File and net.Conn implementations could implement this interface to help with the above scenario, and since the go runtime is already doing a bunch of fd notification magic it should generally be easily supported in the stdlib across platforms.

ianlancetaylor commented 1 year ago

I don't understand what would be reading from that channel. How does it help with the io.Copy, which is presumably sitting in a call to Read?

Jorropo commented 1 year ago

If the client disconnects there is no mechanism (aside from platform specific code like epoll) to get notified that the copy operation needs to be cancelled.

.Read returns io.EOF (or an other error if the connection was killed or a deadline is reached) on all OSes, you don't need to implement anything using epoll the runtime do this for you already. See here this very simple example:

package main

import (
    "fmt"
    "io"
    "net"
    "os"
)

func main() {
    conn, err := net.Dial("tcp", "localhost:11111")
    if err != nil {
        panic(err)
    }
    defer conn.Close()

    _, err = io.Copy(os.Stdout, conn)
    fmt.Println("copy exited:", err)
}

If I listen using nc -lvp 11111, run go run a.go and finaly CTRL + C netcat, I see:

copy exited: <nil>

I think you have some other bug and this is not related to io.Copy (timeouts ?).

*os.File and net.Conn implementations could implement this interface to help with the above scenario, and since the go runtime is already doing a bunch of fd notification magic it should generally be easily supported in the stdlib across platforms.

They already do by returning an error on .Read when the ressource is closed while you were reading from it.

cpuguy83 commented 1 year ago

@ianlancetaylor

I don't understand what would be reading from that channel.

Another goroutine that can close the read side.

@Jorropo The issue here is that the read side is never closed, only the write side.

Jorropo commented 1 year ago

Another goroutine that can close the read side. The issue here is that the read side is never closed, only the write side.

So the issue here is that dst (let's say the container) is closing the write end (it's stdin but not stdout and stderr) however src (let's say the remote client) is deadlocked (it has no more data to send and waits on the write end on to be closed) ? And since you are blocked on the remote read, you never attempt to write to stdin and thus never notice it is closed for writing (which if you did you would propagate to the remote read).

Is the paragraph above correct ?

(it's stdin but not stdout and stderr)

This is important for the setup here, because if stdout and or stderr are closed, your container -> client goroutines could be tasked to close the client -> container connection (with some extra logic).

ianlancetaylor commented 1 year ago

If I understand correctly, you are suggesting that an os.File should implement a CloseNotify method that it can use to notify people when it is closed. If so, that seems like something that can be implemented as a wrapper around os.File.

cpuguy83 commented 1 year ago

@ianlancetaylor It could indeed be, but since the runtime already has epoll (or equivalent on other platforms) wired up it seemed like a good case to generalize.

ianlancetaylor commented 1 year ago

Closing a descriptor is not the same as closing a os.File. It's the other way around: closing an os.File closes the descriptor.

rittneje commented 1 year ago

It isn't very clear from your description which of the following situations you are in.

  1. You have a file/socket in your process that you are writing to via io.Copy in some goroutine. Elsewhere in your application, you decide to close that file/socket. However, since io.Copy is waiting to read data from some other file/socket, the goroutine does not exit.
  2. You have a pair of sockets/fds. One fd is used for writing (in your process), while the other is used for reading (in some other process). The other process decides to close its fd (or it exits or gets killed or whatever). Again, since io.Copy is waiting to read data from some other file/socket, the goroutine does not exit.
cpuguy83 commented 1 year ago

@rittneje In the specific case with moby, we have a long running process (ie a container) which we allow API users to attach to the stdio streams of. So we have goroutines that effectively look like io.Copy(apiStream, processStdout). When apiStream is disconnected the goroutine is wedged unless there is new data to read from processStdout.

Basically there is no facility built-in to go to know when a fd has closed other than by trying to read/write on it (perhaps some other socket op would do as well). My solution to our specific problem right now is to drop down to the syscall.RawConn, setup epoll to notify me when the fd closes, wait for it in a goroutine, then run a cancel function. I'm not familiar with the Windows side of how to do this, but we'll need to do the equivalent on Windows. I guess the other option is to say "this old API is broken" and add a new API that tracks streams and has explicit call to signal a close rather than just closing the fd.

bcmills commented 1 year ago

If the client disconnects there is no mechanism (aside from platform specific code like epoll) to get notified that the copy operation needs to be cancelled.

Both net.TCPConn and net.UnixConn are bidirectional: they support not only reading but also writing. I wonder: for your use cases, are you also making use of the read side of those connections?

In particular, if you don't expect the other side to write anything more to the socket, you could call Read on it, which would block until the other side disconnects. Then you run your io.Copy, and if the Read call on the write socket unblocks you can close or cancel whatever is producing the input to the Copy operation as well.

That is, I would expect a realistic example to look more like:

var (
    dst io.ReadWriteCloser = …
    src io.ReadCloser = …
)

errc := make(chan error, 1)
go func() {
    _, err := io.Copy(io.Discard, dst)
    // The connection to dst is broken.
    // Close src to interrupt the io.Copy from it.
    src.Close()
    errc <- err
}()

_, srcErr := io.Copy(dst, src)
// Close dst to interrupt the io.Copy from it.
dst.Close()
dstErr := <-errc
cpuguy83 commented 1 year ago

@bcmills We do use the read side. Your example also doesn't work if/when the client does a close-write on the socket, which wouldn't be completely unheard of it the client has nothing to send and is only expecting to receive.

rittneje commented 1 year ago

@cpuguy83 My question was really more getting to the fact that your proposed CloseNotify() method is kind of ambiguous. Does the channel signal that:

  1. This file descriptor was closed?
  2. The other end of this file descriptor was closed or shut down (i.e., "broken pipe")?
  3. Both?

Also, I assume that currently the runtime poller only polls if you are currently trying to read/write with the fd. But for CloseNotify() to function it would have to eternally poll every fd (or at least, every fd where that method had ever been called). Also you would specifically be waiting for the fd NOT to be available for writing, which AFAIK is not something that epoll offers.

rittneje commented 1 year ago

I will also point out that io.Copy simply is not a safe operation in production applications in general.

For remote TCP connections in particular, you need some sort of application-level ping between processes in order to properly deal with half-closed connections.

On a somewhat related note, io.Copy does not provide any straightforward way of applying timeouts/deadlines to each read or write operation. So for example, if you are copying to a pipe, and the process on the other side of the pipe stops reading, you will essentially deadlock.

Certainly there are use cases for io.Copy that are willing to accept the aforementioned caveats. But then the fact that io.Copy won't notice the destination is no longer writable until it has something to write is just another caveat.

cpuguy83 commented 1 year ago

@rittneje On Linux I would expect this to be waiting for EPOLLHUP, however half-closed (EPOLLRDHUP) is indeed another scenario that should definitely be considered when designing the interface.

I don't think it's particularly useful to say io.Copy is not safe. It is for the most part pretty straight forward and even has performance optimizations (with support for splice, sendfile, and copy_file_range).

rittneje commented 1 year ago

On Linux I would expect this to be waiting for EPOLLHUP, however half-closed (EPOLLRDHUP) is indeed another scenario that should definitely be considered when designing the interface.

@cpuguy83 Again, AFAIK, you cannot poll an fd until EPOLLHUP. Rather, you can only poll until the fd is available for writing (or reading). If you go to poll and the fd is already shut down then you'll get EPOLLHUP, but you cannot tell it to block until that happens. Thus what you are proposing would require what amounts to a busy loop.

cpuguy83 commented 1 year ago

@rittneje You are right EPOLLHUP does not seem to work on tcp sockets, though it does with unix sockets. In any case, EPOLLRDHUP seems to do fine there. https://gist.github.com/cpuguy83/377f1a8eedf32159094845dc1ae77f95

xvertile commented 1 day ago

I have also encountered this issue of it just blocking for ever in a TCP server setup. What I have done is set the deadline on a connection:

timeoutDuration := 1 * time.Second
proxyConn.Conn.SetDeadline(time.Now().Add(timeoutDuration))

its not perfect and I really wish the io.Copy had some lower level implementation to prevent it from blocking for ever.