golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.62k stars 17.48k forks source link

runtime: manual instrumentation of KeepAlive is fraught #34810

Open beoran opened 4 years ago

beoran commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Likely, yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/bjorn/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bjorn/src/go"
GOPROXY="http://claudette.applied-maths.local:13909"
GORACE=""
GOROOT="/home/bjorn/opt/go"
GOTMPDIR=""
GOTOOLDIR="/home/bjorn/opt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/bjorn/work/src/i41healthapp/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build701296847=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am currently writing a game library in Go that avoids cgo and calls the OS directly, certainly on Linux thanks to it's stable kernel API. (See: https://gitlab.com/beoran/galago https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go )

I have an input Device like this:

// Device models an input device
type Device struct {
    FileName string 
    *os.File
    // Cached device information
    Info struct { 
        Events          []SupportedEvent
        Keys            []SupportedKey
        Axes            []AbsoluteAxis
        Rolls           []RelativeAxis
        Name            string
        ID              string
    }
}

// Keeps the device's file from being garbage collected.
func (d * Device) KeepAlive() {
    runtime.KeepAlive(d.File)
}

// Icotl performs an ioctl on the given device
func (d * Device) Ioctl(code uint32, pointer unsafe.Pointer) error {
    fmt.Printf("ioctl: %d %d %d\n", uintptr(d.Fd()), uintptr(code),
uintptr(pointer))
    _, _, errno := syscall.Syscall(
        syscall.SYS_IOCTL,
        uintptr(d.Fd()),
        uintptr(code),
        uintptr(pointer))
    if (errno != 0) {
        return errno
    }
    d.KeepAlive()
    return nil
}

Notice the KeepAlive? If I leave that out the program crashes, because the device 's io.File gets garbage collected. This took me quite some time to figure out and it is not obvious that that would happen, and that runitme.KeepAlive() is needed here. It would be great if I didn't have to manually insert runtime.KeepAlive calls when using os.File.Fd when using system calls. Or if there was at least vet/lint tooling to suggest when I would probably need it.

I posted this as a new issue to split it off from #34684, where it was a tangential issue.

acln0 commented 4 years ago

Consider using https://golang.org/pkg/os/#File.SyscallConn over the the raw file descriptor returned by Fd. The SyscallConn API is perhaps a little more verbose, but it deals with lifetime issues correctly, and wrappers like the Ioctl method in your snippet can hide it from callers entirely.

beoran commented 4 years ago

I took a look at SyscallConn and it looks like it will work, at the hopefully small overhead of having to use a closure.

But TIL that File.Fd() sets the file to blocking mode, which I explains why I was unable to do nonblocking IO on the devices. I feel that should have been in the documentation of File.Fd(). Furthermore, it would be a good idea to refer from the documentation of File.FD to that of File.SyscallConn().

acln0 commented 4 years ago

I feel that should have been in the documentation of File.Fd().

It is, sort of. See #22934 and https://github.com/golang/go/commit/415349575dec277fbadf08b9d690d07fe313b288.

andybons commented 4 years ago

@beoran is this still an issue for you or would you like to re-purpose this to a documentation fix?

beoran commented 4 years ago

This issue may be repurposed as a documentation issue. Also, the RawConn documentation really could use an example on how to use it.

Op do 10 okt. 2019 17:30 schreef Andrew Bonventre <notifications@github.com

:

@beoran https://github.com/beoran is this still an issue for you or would you like to re-purpose this to a documentation fix?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/34810?email_source=notifications&email_token=AAARM6MQJHZICVIXOP3UI2TQN5DCHA5CNFSM4I7IPINKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA4XX7Y#issuecomment-540638207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARM6NH2FRVIIF4HPDSDQLQN5DCHANCNFSM4I7IPINA .

andybons commented 4 years ago

@ianlancetaylor in case there are objections.

mdempsky commented 4 years ago

I think this issue should be kept open until we've discussed it further.

I'm not convinced the Go compiler / runtime necessarily need to do anything differently, but I do think there's opportunity for vet or lint or something should probably warn users when they need to use runtime.KeepAlive.

For example, I just happened to be looking at github.com/google/gopacket, and I noticed that its pcap library suffers from this same issue: they call file.Fd() but don't have any runtime.KeepAlive(file) call afterwards to ensure that file isn't GC'd.

beoran commented 4 years ago

https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go now uses File.SyscallConn , and as a bonus, reading Linux input events has non-blocking as well. Yay ^_^. But, if @acln0 hadn't kindly informed me of the existence this function, then I would still be struggling.

So yes, both better documentation and a go vet or go lint check would be welcome to help others avoid this issue.

andybons commented 4 years ago

Sounds good to me. Thanks for chiming in, @mdempsky.

beoran commented 4 years ago

A very related problem I am having now, with my library is that the SyscallConn can't be used reliably in conjunction with unix.Poll(fds []PollFd, timeout int) (n int, err error). I would either have to make the closure passed to Control() call Control() recursively over all Files I want to poll on , or I have to cheat and store the FD somewhere and hope that it will stay valid for the file. I could of course use File.Fd() again, but that makes the file blocking, which is what I want to avoid in the first place. What should I do to poll on several files without using File.Fd()?

eliasnaur commented 4 years ago

A very related problem I am having now, with my library is that the SyscallConn can't be used reliably in conjunction with unix.Poll(fds []PollFd, timeout int) (n int, err error). I would either have to make the closure passed to Control() call Control() recursively over all Files I want to poll on , or I have to cheat and store the FD somewhere and hope that it will stay valid for the file. I could of course use File.Fd() again, but that makes the file blocking, which is what I want to avoid in the first place. What should I do to poll on several files without using File.Fd()?

Is there a reason you can't mark the file descriptor non-blocking after acquiring it from File.Fd()?

beoran commented 4 years ago

That could work, but I'm not sure how to do that. File.Fd() sets a nonblocking flag to false in the underlying structure for os.File directly. AFAIK, `unix.SetNonblock(fd, true) has no effect on this, or am I mistaken?

eliasnaur commented 4 years ago

On Fri Oct 11, 2019 at 1:59 AM Beoran wrote:

That could work, but I'm not sure how to do that. File.Fd() sets a nonblocking flag to false in the underlying structure for os.File directly. AFAIK, `unix.SetNonblock(fd, true) has no effect on this, or am I mistaken?

I consider Fd() to mean that I take over full responsibility of the descriptor, which also means I can't call os.File methods and expect them to work the same, or at all.

-- elias

beoran commented 4 years ago

Well, yes, but that was exactly the problem that SyscallConn was supposed to solve. With it you can get access to the FD while the os.File() still works. Since this is platform specific code anyway, I might as well then open the file with unix.Openin stead, and just handle everything myself, although I fear this might interfere with the way goroutines are supposed to work.

ianlancetaylor commented 4 years ago

Yes, if you want to manage I/O yourself, I recommend that you use unix.Open. That won't interfere with goroutines. It just means that I/O on those descriptors won't use the runtime poller, but it seems that you want to call unix.Poll yourself anyhow. There is no good way to both use your own poller and the runtime poller, and there doesn't seem to be much point to doing so.

beoran commented 4 years ago

Well, i'm calling unix.Poll myself because I don't know how I could use the runtime poller in my situation. Would it even be possible, and how? If not I'll stick to managing everything myself.

The top issue remains though, this is all rather fraught and not well documented. Certainly x/sys/unix could do with more documentation and examples.

ianlancetaylor commented 4 years ago

@beoran I don't really know what you are doing so I don't know why you can't use the runtime poller. In general it's fine to call ioctl on os.File or net.Conn values using SyscallConn and to also use the ordinary Read and Write methods which will use the runtime poller.

mdempsky commented 4 years ago

It seems like this discussion is drifting a bit from the initial report, which was that os.File.Fd is dangerous because it (in general) requires using runtime.KeepAlive, and we don't have any tooling to help check for that.

Some questions:

  1. Is the transformation fn(file.Fd()); runtime.KeepAlive(file) => file.SyscallConn().Control(fn) (or maybe Read or Write instead of Control) safe?

  2. If so, should we just deprecate os.File.Fd and encourage users to switch to os.File.SyscallConn?

  3. If not (or maybe regardless), should we implement a lint/vet warning that file.Fd() should always be used in conjunction with runtime.KeepAlive(file)?

ianlancetaylor commented 4 years ago
  1. Yes, that is safe provided that fn does not preserve its fd argument in any way.
  2. Well, no, because some people really do need to hang on to the file descriptor.
  3. I'm not sure. Assuming your program does not change os.Stdout there's nothing wrong with os.Stdout.Fd(), or in general calling Fd on a file descriptor that is stored in a global variable or otherwise preserved.
mdempsky commented 4 years ago
  1. Well, no, because some people really do need to hang on to the file descriptor.

Do you have any examples of this?

For this to be safe, the user has to also keep the os.File alive. And if they already have to do that, it seems easier to just hang onto the os.File or syscall.RawConn, and convert it to uintptr via RawConn.Control as needed.

It's also an option to dup the FD, so they can manage its lifetime itself.

Assuming your program does not change os.Stdout there's nothing wrong with os.Stdout.Fd(), or in general calling Fd on a file descriptor that is stored in a global variable or otherwise preserved.

We have room to make a vet/lint check arbitrarily sophisticated to try to avoid false positives; but in general, I'm pretty skeptical of the ability to use it safely.

E.g., I just started a random audit of os.File.Fd call sites, and I already found that package os itself doesn't use it safely:

https://github.com/golang/go/blob/54abb5ff868bf20239bd7960038a28c3f31ba25e/src/os/exec_posix.go#L46-L49

(Admittedly this case would be very awkward to rewrite using SyscallConn, but not impossible.)

After looking at about 100 call sites, cmd/compile is the only code that I found that uses os.File.Fd and then explicitly keeps the os.File alive. (And I'm not 100% confident that if I wrote that code today that I'd remember to have done that.)

I'd estimate I saw 10% of cases that were bad; 20% of cases that were okay (i.e., the os.File was obviously still alive after the Fd call); 20% of cases that used os.Std{in,out,err} (so also okay); but a huge chunk of remaining cases where the *os.File is dead after the Fd call (at least locally within the function).

It would be interesting to instrument os.File.Fd() with a call to runtime.GC() immediately before returning. I wouldn't be surprised if some programs start failing.

ianlancetaylor commented 4 years ago

An example where you need the file descriptor is when you are passing a set of file descriptors to another process via syscall.UnixRights.

The use in package os that you mention is safe because the file descriptors only have to live as long as the fork, and the fork code will be using the SysProcAttr that points to the os.File values. Not that I would be opposed to adding some KeepAlive calls.

I agree that some sort of instrumentation would be interesting. It could perhaps be done via GODEBUG.

mdempsky commented 4 years ago

An example where you need the file descriptor is when you are passing a set of file descriptors to another process via syscall.UnixRights.

Can't you do that inside the RawConn.Control?

the fork code will be using the SysProcAttr that points to the os.File values.

I assume you mean ProcAttr, and ProcAttr takes the files as uintptr, not os.File: https://golang.org/pkg/syscall/#ProcAttr

Am I missing something?

Edit: I don't think I'm missing anything: https://github.com/golang/go/issues/34858

beoran commented 4 years ago

As I discovered during this discussion, RawConn.Control is only useful for operations on a single file. In situations where you need a set of fds, such as for ProcAttr, RawConn.Control is impractical because for N files you would have to call it N times it recursively from the closure. I would say we need some functionality that works like RawConn, but then for multiple files. RawConnSet, maybe?

mdempsky commented 4 years ago

In situations where you need a set of fds, such as for ProcAttr, RawConn.Control is impractical because for N files you would have to call it N times it recursively from the closure.

I'll admit it's a bit tedious, but I don't think it's impractical. The recursion can be easily abstracted away behind a helper like so: (caveat: untested)

func MultiControl(conns []syscall.RawConn, f func(fds []uintptr)) error {
    if len(conns) == 0 {
        f(nil)
        return nil
    }

    var err error
    fds := make([]uintptr, len(conns))
    i := 0

    var do func(fd uintptr)
    do = func(fd uintptr) {
        fds[i] = fd
        i++
        if i == len(conns) {
            f(fds)
            return
        }
        err0 := conns[i].Control(do)
        if err == nil {
            err = err0
        }
    }
    err0 := conns[i].Control(do)
    if err == nil {
        err = err0
    }
    return err
}

However, while writing this, I did notice a somewhat thornier issue: Plan 9 doesn't support SyscallConn. It just returns EPLAN9.

ericlagergren commented 4 years ago

To add another data point: at work we've a (rather large) package that wraps a certain DLL. Pretty much every constructed object needs to be closed in order to not leak memory, etc. As a safety feature, we've added finalizers. Just like os.File.

Most DLL functions require a handle, which is just a number. This leaves us open to the GC collecting the object out from under us if we forget to use runtime.KeepAlive. Just like os.(*File).Fd.

Even though we try to do due diligence, we've run into the issue a couple of times (thankfully just in testing). It's fairly easy to remember to use runtime.KeepAlive for os.File, because it's more or less the only part of the stdlib that you use with any frequency that has this problem.

I don't know what the valid criteria are for sniffing this out, but whenever I've run into this problem there's been two similarities:

(a) passing a uintptr (or a type whose underlying type is uintptr) to another function, and (b) accessing that uintptr via some method call

ianlancetaylor commented 4 years ago

@mdempsky You're right, I was thinking of the ExtraFiles field in os/exec.Cmd. That is used to set the Files field in syscall.ProcAttr. The use of Files in os.ProcAttr does seem problematic. It seems to need a runtime.KeepAlive(attr).

gopherbot commented 4 years ago

Change https://golang.org/cl/201198 mentions this issue: os: keep attr.Files alive when calling StartProcess

beoran commented 4 years ago

Well, while I am happy to see that some bugs in the Go standard library were fixed because of my observation, this issue seems a bit stalled. How could we fix the original issue? RawConn.Control is not supported on all platforms and hard to use for a set of FD's.