os: Missed checking for EINTR in (*os.File).readdir

jameshartig commented 4 years ago

What version of Go are you using (`go version`)?

go version go1.15 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/toaster/.cache/go-build"
GOENV="/home/toaster/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/opt/gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/opt/gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build047277846=/tmp/go-build -gno-record-gcc-switches"

What did you do?

See: https://go-review.googlesource.com/c/go/+/232862/

If you call ioutil.ReadDir it ends up calling (*os.File).readdir which calls lstat and does not retry on an EINTR error. Other spots were handled in https://go-review.googlesource.com/c/go/+/232862/ and I think this spot was just missed.

I wasn't sure of the best way since lstat is automatically generated. Should this just be handled in readdir?

What did you expect to see?

I expected ioutil.ReadDir not to error.

What did you see instead?

Instead ioutil.ReadDir errored with lstat /opt/admiral-dev/hazadblock/dist/static/lib/ergoHeadOverwrite.dev.min.js.map: interrupted system call.

ianlancetaylor commented 4 years ago

@tv42 Fair point, please read all discussions about deadlines above as using contexts (https://golang.org/pkg/context). The issues seem the same.

networkimprov commented 4 years ago

Ian, another way to look at the problem is that EINTR is part of the API for CIFS & FUSE, because they're designed to be user-interruptible. This is not compatible with Go, so we'd like to fix that.

Syscall interruption by signalling a thread requires runtime support, so wouldn't the API reside in stdlib? See my suggested mechanism in https://github.com/golang/go/issues/40846#issuecomment-678913341.

There's no need to replicate the file API with a Context or deadline argument. This is sufficient:

// terminates syscalls trying any path beginning with rootpath
// the original caller will see a PathError with .Err == io.ErrInterrupt
InterruptPendingUse(rootpath string) error

This is an os or runtime package function. I think you'd have to build Context cancellation on the above API.

Shall I post a proposal for this?

ianlancetaylor commented 4 years ago

Ian, another way to look at the problem is that EINTR is part of the API for CIFS & FUSE, because they're designed to be user-interruptible. This is not compatible with Go, so we'd like to fix that.

That is entirely reasonable for C, but it does not make sense for Go, because Go is always multi-threaded. When you send a signal to a Go program, it is unpredictable whether it will interrupt a system call or not, because it is unpredictable which of the various threads will receive the signal. The only way to make that predictable is to direct the signal to a specific thread. But the only code that can do that reliably is code within the program itself. Perhaps there are cases where a Go program should interrupt a specific thread. But even if we believe that, then the error returned to the user's program for that thread should not be EINTR. The important fact is not that the system call was interrupted. The important fact is why it was interrupted. The error returned to the user should be something like os.ErrDeadlineExceeded or context.Canceled. It should never be EINTR.

ianlancetaylor commented 4 years ago

Syscall interruption by signalling a thread requires runtime support, so wouldn't the API reside in stdlib?

I suspect that we could do everything via the golang.org/x/sys/unix package, but I could be missing something.

// terminates syscalls trying any path beginning with rootpath
// the original caller will see a PathError with .Err == io.ErrInterrupt
InterruptPendingUse(rootpath string) error

That is not going to be a good API, because it will require every single file operation to register itself somewhere so that it can be interrupted, and then deregister itself when complete. That is too high a price to pay for a feature that extremely few programs will want to use.

tv42 commented 4 years ago

Strawman API proposal: end := bikeshed.Cancellable(ctx); defer end() causes syscalls made inside the function interrupt if context is canceled, returning the error from ctx.Err(). Multiple overlapping calls would be legal. Interrupting would be best-effort and not guaranteed.

(Or, alternatively, bikeshed.Cancellable(ctx, func() { ... }), but I like the defer variant more because return values are awkward with func().)

This could be done by piggybacking on SIGURG (using runtime knowledge to target the the right OS thread, whereever the goroutine is being run) and a per-goroutine noRetryOnEINTR flag that temporarily disables the EINTR retry loop.

(This is starting to smell a lot like implicit context passing, https://github.com/golang/go/issues/28342 and probably others -- but it would avoid duplicating every API just to add ctx.)

ianlancetaylor commented 4 years ago

Frankly I wouldn't hesitate to duplicate APIs to add a context. Explicit is better than implicit.

I'm not inherently opposed to using a signal to interrupt the thread but it's not like we can just send a signal and carry on. What if the thread isn't in a system call? What if the thread is preempted by the kernel but just about to enter a system call when it resumes? Maybe a signal can be part of the code but it needs considerable machinery around it.

networkimprov commented 4 years ago

Strawman API proposal

When a network filesystem becomes unavailable, you interrupt all syscalls trying paths within a file tree whose root is on the network. Marking as interruptible all file ops in the current function (and implicitly, those in functions it calls, etc) isn't helpful, as you need to interrupt some ops and not others, depending on their location.

Interrupt errors would often trigger context cancellation. A file tree is typically used by many tasks; it's not a task unto itself.

require every single file operation to register itself somewhere

Thankfully, no. The init() procedure for "os/interrupt" could enable said registration. Alternatively, replicate the file API with variants that perform said registration. The former is more user-friendly, but either works :-)

it's not like we can just send a signal and carry on. What if...

@aclements is probably better able to address these Qs. (Austin, the current subthread starts at https://github.com/golang/go/issues/40846#issuecomment-678076586.)

via the golang.org/x/sys/unix package

We also need this on Windows for CIFS. The x/ tree is fine, but not x/sys.

a feature that extremely few programs will want to use.

Any program that relies on the filesystem and is deployed into environments that its developers don't control needs to allow for network filesystems. They're still heavily used in office LANs.

tv42 commented 4 years ago

What if the thread isn't in a system call?

Since the scheduler sends SIGURGs anyway, SIGURG should be harmless. Either the scheduler knows that it didn't mean to preempt that goroutine, and the signal handler does nothing, or the goroutine goes to sleep for a while. Neither should be visible to the program, and these false positive SIGURGs shouldn't happen often enough to cause much trouble.

What if the thread is preempted by the kernel but just about to enter a system call when it resumes?

Good question. This is sort of comparable to the race inherent in for { select { case <-ctx.Done(): return ctx.Err(); default: }; expensiveOperation() }; that sort of look-before-use is better than nothing, but obviously expensiveOperation understanding context is better. Our SIGURG could hit just before the syscall.

Basically what is being asked here is to reconcile the desire ("cancel the slow operation in this goroutine") with the current state ("this goroutine is not / is inside a syscall"), and sometimes that sort of a reconciliation needs to be repeated until it reaches its desired effect.

One thing that needs careful thought is aiming the cancellation at the correct syscall. Consider syscallA(ctx1); syscallB(ctx2), you don't want ctx1 interrupting syscallB. That can happen because the cancellation is an outside observer to the goroutine's progress, sending signals from a separate goroutine. One way to deal with that is to let syscallB's EINTR retry loop just restart the syscall if we encounter such bad luck. This can be made to happen easily by checking ctx.Err() != nil in the EINTR retry loop; even if we get a false positive SIGURG, we won't get a false positive cancellation for our context.

Strawman implementation idea 1: Waste a goroutine per cancelable syscall

func SyscallCtx(ctx context.Context, ...) ... {
    gp := ...something...
    exiting := make(chan struct{})
    defer close(exiting)
    go func() {
        for {
            select {
                case <-exiting:
                    return
                case <-ctx.Done():
                    sendSIGURG(gp)
                    // could also write this with a ticker to get an immediate return on <-exiting
                    time.Sleep(aLittle)
            }
        }
    }()
    // make the syscall, with EINTR retry loop checking ctx
}

Strawman implementation idea 2: Add a context field and a periodic job

Scheduler "G" goroutine state (type g) stores an extra atomic unsafe.Pointer ctxp, pointing to the relevant context or nil.

New function SyscallCtx sets gp.ctxp, and defers an atomic set to nil.

SyscallCtx has an EINTR retry loop that bails out on cancel: if err := ctx.Err(); err != nil { return err }

Have a periodic job that scans for G's in state _Gsyscall, ctxp := (*context.Value)(atomic.LoadPointer(&gp.ctxp)); if ctxp != nil && (*ctxp).Err() != nil { sendSIGURGToG(gp) }. This is our reconciliation loop.

(Alternatively, instead of g.ctxp and increasing the size of every G, have a single sync.Map(-alike) that stores gp->ctx mapping. Could make the scan cheaper, as it only needs to walk over current cancellable syscalls. Then again, causes central coordination.)

(Might also be able to relocate ctxp to M? As there's no such thing as a syscall without an M, I believe, and there's a lot fewer M's than G's. My knowledge to the scheduler runs out here.)

ianlancetaylor commented 4 years ago

@networkimprov

require every single file operation to register itself somewhere

Thankfully, no. The init() procedure for "os/interrupt" could enable said registration. Alternatively, replicate the file API with variants that perform said registration. The former is more user-friendly, but either works :-)

Sorry, I didn't mean that user code would have to register. I meant that the each file operation in the standard library would have to register itself. As I said earlier, that is too high a price to pay for a feature that extremely few programs will want to use.

Any program that relies on the filesystem and is deployed into environments that its developers don't control needs to allow for network filesystems. They're still heavily used in office LANs.

That is easy to say, but how many actual programs actually do that, in any language?

ianlancetaylor commented 4 years ago

@tv42 For read-only operations I would vote for what I suggested above in https://github.com/golang/go/issues/40846#issuecomment-678889242, but add some mechanism to clean up the system call in the background. I don't see an obvious reason to force the real work of the program to wait until the system call is interrupted.

For write operations I agree that we would need to somehow know whether the system call completes or is interrupted.

tv42 commented 4 years ago

Leaving the syscall running would leave the buffer passed to Read etc poisoned, so that would only apply to things that allocate internally, and have no out-parameters.

I'm also uncomfortable with leaking such threads-in-syscalls. There's no guarantee when or even if they will return. It just seems like a bad design to me. I'd much rather have the user program in charge of them, to have that visibility.

bcmills commented 4 years ago

Either way, it seems clear to me that this use-case needs a proposal with more depth to it than just “don't check for EINTR in os functions”. @tv42, perhaps file a separate proposal for a high-level API, and we can figure out the implementation once we know what the API should look like?

(But again, I wouldn't expect to make progress on any API changes until we have a clearer picture of where #5636 is going.)

networkimprov commented 4 years ago

The init() procedure for "os/interrupt" could enable said registration. Alternatively, replicate the file API with variants that perform said registration. The former is more user-friendly, but either works :-)

Sorry, I didn't mean that user code would have to register. I meant that the each file operation in the standard library would have to register itself.

I understood what you meant. It's addressed by either of the approaches I gave (quoted above).

All: a core issue is what the runtime must do to let a blocked syscall thread be signaled on demand. We should find a general-purpose scheme. I see two options...

a) keep a table of syscall thread IDs, adding to it when requested; provide a reference or copy of it when requested b) asynchronously post a thread ID to an app-supplied variable/table

PS: discussion of #5636 has moved to https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_draft_design/?sort=new

networkimprov commented 4 years ago

I filed #41054 after digesting the above discussion. It should support both use cases described above:

a) APIs that can be cancelled via a context argument, and b) interrupting any syscalls trying paths on a specific tree.

Cyberax commented 4 years ago

So it seems to me that the only programs that are going to use a deadline for a stat call are going to be ones that are aware that they are running on a networked file system. And I argue that in the modern era those programs are much more likely to use a different approach, rather than use a networked file system at all.

I'm sorry for a late post here, but I do have an example of this kind of a workload. I'm using NFS-based disk as a write-through cache for large objects in front of DynamoDB. My code first tries to check the contents on the disk with a very tight deadline (2ms) and if the deadline expires it falls back on an expensive call into DynamoDB.

As such, it needs timeouts for all calls used and stat is among them (to check for object's presence).

networkimprov commented 4 years ago

Alex, curious to hear what OS is running the NFS client, and which client if not the default?

Last I checked Linux doesn't interrupt syscalls to NFS on signal.

ianlancetaylor commented 4 years ago

I'm sorry for a late post here, but I do have an example of this kind of a workload. I'm using NFS-based disk as a write-through cache for large objects in front of DynamoDB. My code first tries to check the contents on the disk with a very tight deadline (2ms) and if the deadline expires it falls back on an expensive call into DynamoDB.

Do you mean: you do a read, and if the read doesn't return within 2ms you contact the database directly?

We already support setting a deadline on a read call, so it's not immediately obvious to me that this requires any feature that we don't already support. Although I don't know whether the kernel supports polling in an NFS file descriptor.

Cyberax commented 4 years ago

Correct, we do all operations (including read() and write()) under tight deadlines and if these deadlines are violated we fall back to the direct database requests. These requests are expensive in monetary terms, as DynamoDB essentially bills per request.

In case of read() and write() it's a bit more involved, we "recharge" the deadline upon receiving each 512Kb of data.

The stat() calls are used to get the object size and the creation time, we don't use readdir() right now.

We use regular IO in a child goroutine and simply abandon it in case the deadline expires. We also added a semaphore to limit the concurrency in case of NFS server hiccups. It all works acceptably well, but it would be nice to actually be able to just be able to use the regular context-based cancellation.

Cyberax commented 4 years ago

Alex, curious to hear what OS is running the NFS client, and which client if not the default?

Linux and NFSv4.

Last I checked Linux doesn't interrupt syscalls to NFS on signal.

Linux supports interrupting NFS calls: https://elixir.bootlin.com/linux/latest/source/net/sunrpc/sched.c#L924 and https://elixir.bootlin.com/linux/latest/source/net/sunrpc/sched.c#L954 - but so far it's used only during unmounting. I'm planning to fix that in the upstream Linux.

networkimprov commented 4 years ago

@Cyberax I'm curious as to why NFS vs HTTP or somesuch?

ncw commented 3 years ago

@bcmills @ianlancetaylor Is the fix for this https://go-review.googlesource.com/c/go/+/249178/ going to get a backport into go1.15.x ? I couldn't see it on the milestones.

ianlancetaylor commented 3 years ago

@ncw The problem is not new in 1.15, so this does not qualify for a backport by our usual rules.

networkimprov commented 3 years ago

To clarify, for the great majority of Go sites, the problem is new in 1.14. It can in theory occur before that, but was never reported.

ncw commented 3 years ago

@ncw The problem is not new in 1.15, so this does not qualify for a backport by our usual rules.

Ah, that's a shame. Rclone users seem to come across this quite regularly.

golang / go