golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.99k stars 17.67k forks source link

os: Missed checking for EINTR in (*os.File).readdir #40846

Closed jameshartig closed 4 years ago

jameshartig commented 4 years ago

What version of Go are you using (go version)?

go version go1.15 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/toaster/.cache/go-build"
GOENV="/home/toaster/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/opt/gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/opt/gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build047277846=/tmp/go-build -gno-record-gcc-switches"

What did you do?

See: https://go-review.googlesource.com/c/go/+/232862/

If you call ioutil.ReadDir it ends up calling (*os.File).readdir which calls lstat and does not retry on an EINTR error. Other spots were handled in https://go-review.googlesource.com/c/go/+/232862/ and I think this spot was just missed.

I wasn't sure of the best way since lstat is automatically generated. Should this just be handled in readdir?

What did you expect to see?

I expected ioutil.ReadDir not to error.

What did you see instead?

Instead ioutil.ReadDir errored with lstat /opt/admiral-dev/hazadblock/dist/static/lib/ergoHeadOverwrite.dev.min.js.map: interrupted system call.

bcmills commented 4 years ago

CC @ianlancetaylor

ianlancetaylor commented 4 years ago

As far as I know neither the stat nor the lstat system call is interruptible by a signal. But the error you show suggests that lstat can return EINTR. That suggests that stat can also return EINTR. Either way it's not clear to me that this has anything to do with (*os.File).Readdir as such. Perhaps we need to fix Stat and Lstat to loop on EINTR.

Can you tell us more about the exact case where this fails? Can you show us a way to reproduce the problem? Thanks.

jameshartig commented 4 years ago

@ianlancetaylor thanks for the quick reply!

Can you tell us more about the exact case where this fails?

So the underlying mount is a FUSE filesystem and the stat call actually could make a network call if the information isn't cached locally. From my reading of the fuse documentation [1]:

Interrupting filesystem operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a process issuing a FUSE filesystem request is interrupted, the
following will happen:

  1) If the request is not yet sent to userspace AND the signal is
     fatal (SIGKILL or unhandled fatal signal), then the request is
     dequeued and returns immediately.

  2) If the request is not yet sent to userspace AND the signal is not
     fatal, then an 'interrupted' flag is set for the request.  When
     the request has been successfully transferred to userspace and
     this flag is set, an INTERRUPT request is queued.

  3) If the request is already sent to userspace, then an INTERRUPT
     request is queued.

INTERRUPT requests take precedence over other requests, so the
userspace filesystem will receive queued INTERRUPTs before any others.

The userspace filesystem may ignore the INTERRUPT requests entirely,
or may honor them by sending a reply to the _original_ request, with
the error set to EINTR.

It is also possible that there's a race between processing the
original request and its INTERRUPT request.  There are two possibilities:

  1) The INTERRUPT request is processed before the original request is
     processed

  2) The INTERRUPT request is processed after the original request has
     been answered

If the filesystem cannot find the original request, it should wait for
some timeout and/or a number of new requests to arrive, after which it
should reply to the INTERRUPT request with an EAGAIN error.  In case
1) the INTERRUPT request will be requeued.  In case 2) the INTERRUPT
reply will be ignored.

it seems like returning EINTR is following the spec but I'm also not sure if I can somehow modify the FUSE library to ignore the Interrupt. Can you point me in a direction for how to debug the go-side of this? I'm not sure how I can figure out exactly what signal go received. I was assuming it was SIGUSR but it's not clear to me what the best way is to discover what signal was received or how to correlate that with the separate FUSE program. Here are the debug logs from the FUSE-side:

<- Getattr [ID=0x428 Node=0x25 Uid=1002 Gid=1003 Pid=22546] 0x0 fl=0
<- Interrupt [ID=0x42a Node=0x0 Uid=0 Gid=0 Pid=0] ID 0x428
-> [ID=0x42a] Interrupt
-> [ID=0x428] Getattr error=EINTR

There's a lot of other logs around this so I'm under the current assumption that the Interrupt happened almost immediately after the Getattr was issued, which is weird. If it helps at all, the program doing the reads makes a new goroutine for each file and reads them in so there's 10-15 goroutines reading at once. I don't know yet how to understand what signal caused the Interrupt since I don't think that's exposed to the FUSE program.

Can you show us a way to reproduce the problem?

Once we can determine what the signal is that caused this, then I can hopefully come up with some sort of simple reproduction. We can just make a simple FUSE program that just sleeps and then trigger the signal and it should send an Interrupt.

[1] https://www.kernel.org/doc/Documentation/filesystems/fuse.txt

ianlancetaylor commented 4 years ago

Thanks for the info.

The signal was most likely SIGURG, which is used to preempt a goroutine.

You should be able to see it by running the program under strace -f.

networkimprov commented 4 years ago

@fastest963 what FUSE library are you referring to? How is it configured on your system? That's probably all we need to know to reproduce it.

Go 1.14+ generates a lot of interrupts, for goroutine scheduling. Causing an interrupt is easy in a Go app.

jameshartig commented 4 years ago

@networkimprov It's using https://github.com/bazil/fuse to mount the FUSE mount. Nothing very special on the configuration front other than fuse.AsyncRead and fuse.DefaultPermissions. Here's a program to replicate the problem: https://gist.github.com/fastest963/756f67d406540ad8ae14d22e651043d0

It should print out:

fuse: <- Getattr [ID=0x2 Node=0x1 Uid=1001 Gid=1002 Pid=12860] 0x0 fl=0
fuse: <- Interrupt [ID=0x3 Node=0x0 Uid=0 Gid=0 Pid=0] ID 0x2
fuse: -> [ID=0x3] Interrupt
interrrupted 0x2
fuse: -> [ID=0x2] Getattr error=EINTR
error: stat ./test: interrupted system call

@ianlancetaylor

You should be able to see it by running the program under strace -f.

Here's the relevant strace output:

openat(AT_FDCWD, "/opt/admiral-dev/hazadblock/dist/static/lib/reporting/js.dev.min.js", O_RDONLY|O_CLOEXEC) = 35
futex(0x1c94b98, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_ctl(3, EPOLL_CTL_ADD, 35, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2416816088, u64=140271753736152}}) = 0
fcntl(35, F_GETFL)      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(35, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
fstat(35, 0xc0001cb218) = -1 EINTR (Interrupted system call)
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=10907, si_uid=1002} ---
rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
read(35, "!(function(e){var r={};function "..., 512) = 512
futex(0x1c94b98, FUTEX_WAKE_PRIVATE, 1) = 1
read(35, "ule)return t;var e=Object.create"..., 1024) = 1024
read(35, "sName=i.join(\" \")),e}},179:funct"..., 2048) = 2048
read(35, "typeof Symbol&&n.constructor===S"..., 4096) = 2761
futex(0x1c94b98, FUTEX_WAKE_PRIVATE, 1) = 1
read(35, "", 1335)      = 0
epoll_ctl(3, EPOLL_CTL_DEL, 35, 0xc02032ad2c) = 0
close(35)               = 0

and the associated FUSE logs:

<- Open [ID=0xe79 Node=0xe Uid=1002 Gid=1003 Pid=10935] dir=false fl=OpenReadOnly
-> [ID=0xe79] Open 0x2 fl=0
<- Getattr [ID=0xe7b Node=0xe Uid=1002 Gid=1003 Pid=10935] 0x0 fl=0
<- Interrupt [ID=0xe7d Node=0x0 Uid=0 Gid=0 Pid=0] ID 0xe7b
-> [ID=0xe7d] Interrupt
-> [ID=0xe7b] Getattr error=EINTR

Looks like this happened inside of iotuil.ReadFile because the stat error was ultimately ignored.

ianlancetaylor commented 4 years ago

Thanks. Can you double check that you passed the -f option when you ran strace? The output without -f is unhelpful for Go programs.

jameshartig commented 4 years ago

@ianlancetaylor I actually used -ff and copied only the relevant section out above from the PID that got the signal because there were 30+ threads and a ton of irrelevant traffic but I realized now I should instead just run strace with my example program. For some reason the FUSE mounting failed with Operation not permitted whenever I ran my test binary with strace -f test so instead I made it sleep and then I attached strace to the pid: strace.txt

The relevant section is:

``` [pid 15228] newfstatat(AT_FDCWD, "./test", [pid 15229] rt_sigprocmask(SIG_SETMASK, ~[], [pid 15230] <... nanosleep resumed>NULL) = 0 [pid 15229] <... rt_sigprocmask resumed>[], 8) = 0 [pid 15229] clone(strace: Process 15236 attached [pid 15230] read(6, "8\0\0\0\3\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\351\3\0\0\352\3\0\0"..., 135168) = 56 [pid 15229] <... clone resumed>child_stack=0xc000136000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 15236 [pid 15236] gettid( [pid 15230] read(6, [pid 15236] <... gettid resumed>) = 15236 [pid 15229] rt_sigprocmask(SIG_SETMASK, [], [pid 15236] arch_prctl(ARCH_SET_FS, 0xc000124090 [pid 15229] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15236] <... arch_prctl resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15236] sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0 [pid 15236] sigaltstack({ss_sp=0xc000128000, ss_flags=0, ss_size=32768}, [pid 15231] <... epoll_pwait resumed>[], 128, 1, NULL, 0) = 0 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15236] <... sigaltstack resumed>NULL) = 0 [pid 15231] futex(0xc000030948, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15236] rt_sigprocmask(SIG_SETMASK, [], [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15236] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15236] gettid() = 15236 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15229] futex(0xc000030948, FUTEX_WAKE_PRIVATE, 1 [pid 15236] rt_sigprocmask(SIG_SETMASK, ~[], [pid 15231] <... futex resumed>) = 0 [pid 15229] <... futex resumed>) = 1 [pid 15236] <... rt_sigprocmask resumed>[], 8) = 0 [pid 15231] write(1, "fuse: <- Getattr [ID=0x2 Node=0x"..., 72 [pid 15229] getpid( [pid 15236] clone( [pid 15231] <... write resumed>) = 72 [pid 15231] epoll_pwait(7, [pid 15229] <... getpid resumed>) = 15228 [pid 15231] <... epoll_pwait resumed>[], 128, 0, NULL, 362006) = 0 [pid 15231] epoll_pwait(7, strace: Process 15237 attached [pid 15237] gettid() = 15237 [pid 15237] arch_prctl(ARCH_SET_FS, 0xc000124490) = 0 [pid 15236] <... clone resumed>child_stack=0xc000138000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 15237 [pid 15229] tgkill(15228, 15236, SIGURG [pid 15237] sigaltstack(NULL, [pid 15236] rt_sigprocmask(SIG_SETMASK, [], [pid 15237] <... sigaltstack resumed>{ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0 [pid 15229] <... tgkill resumed>) = 0 [pid 15236] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15237] sigaltstack({ss_sp=0xc00013c000, ss_flags=0, ss_size=32768}, [pid 15236] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15228, si_uid=1001} --- [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15237] <... sigaltstack resumed>NULL) = 0 [pid 15237] rt_sigprocmask(SIG_SETMASK, [], [pid 15236] rt_sigreturn({mask=[]} [pid 15237] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15236] <... rt_sigreturn resumed>) = 0 [pid 15237] gettid( [pid 15236] rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15237] <... gettid resumed>) = 15237 [pid 15229] getpid( [pid 15237] futex(0x672ed8, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15236] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15229] <... getpid resumed>) = 15228 [pid 15229] tgkill(15228, 15236, SIGURG [pid 15236] futex(0x672ed8, FUTEX_WAKE_PRIVATE, 1 [pid 15229] <... tgkill resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15236] <... futex resumed>) = 1 [pid 15237] <... futex resumed>) = 0 [pid 15237] sched_yield( [pid 15236] futex(0xc000124148, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15237] <... sched_yield resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15237] futex(0x672ec0, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15237] rt_sigprocmask(SIG_SETMASK, ~[], [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15237] <... rt_sigprocmask resumed>[], 8) = 0 [pid 15237] clone(child_stack=0xc000152000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 15238 [pid 15229] <... nanosleep resumed>NULL) = 0 strace: Process 15238 attached [pid 15237] rt_sigprocmask(SIG_SETMASK, [], [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15238] gettid( [pid 15237] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15238] <... gettid resumed>) = 15238 [pid 15237] futex(0x672ed8, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15238] arch_prctl(ARCH_SET_FS, 0xc000124890 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15238] <... arch_prctl resumed>) = 0 [pid 15238] sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0 [pid 15238] sigaltstack({ss_sp=0xc000144000, ss_flags=0, ss_size=32768}, [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15238] <... sigaltstack resumed>NULL) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15238] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 15238] gettid( [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15238] <... gettid resumed>) = 15238 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15238] futex(0xc000124148, FUTEX_WAKE_PRIVATE, 1 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15238] <... futex resumed>) = 1 [pid 15236] <... futex resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15238] futex(0xc000124948, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15236] futex(0x672ed8, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 15237] <... futex resumed>) = 0 [pid 15236] rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV USR2 TERM STKFLT CHLD PROF SYS RTMIN RT_1], [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15237] rt_sigprocmask(SIG_SETMASK, ~[], [pid 15236] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15237] <... rt_sigprocmask resumed>[], 8) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15236] futex(0xc000124948, FUTEX_WAKE_PRIVATE, 1 [pid 15237] clone( [pid 15236] <... futex resumed>) = 1 [pid 15238] <... futex resumed>) = 0 [pid 15236] futex(0xc000124148, FUTEX_WAIT_PRIVATE, 0, NULLstrace: Process 15239 attached [pid 15237] <... clone resumed>child_stack=0xc000154000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 15239 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15238] write(9, "\0", 1 [pid 15237] rt_sigprocmask(SIG_SETMASK, [], [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15237] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15238] <... write resumed>) = 1 [pid 15237] futex(0x672ed8, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15238] rt_sigprocmask(SIG_SETMASK, ~[], [pid 15239] gettid( [pid 15231] <... epoll_pwait resumed>[{EPOLLIN, {u32=6761816, u64=6761816}}], 128, 999, NULL, 0) = 1 [pid 15239] <... gettid resumed>) = 15239 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15231] read(8, [pid 15239] arch_prctl(ARCH_SET_FS, 0xc000124c90 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15231] <... read resumed>"\0", 16) = 1 [pid 15239] <... arch_prctl resumed>) = 0 [pid 15238] <... rt_sigprocmask resumed>[], 8) = 0 [pid 15231] futex(0xc000030948, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15239] sigaltstack(NULL, [pid 15238] clone( [pid 15239] <... sigaltstack resumed>{ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15239] sigaltstack({ss_sp=0xc000154000, ss_flags=0, ss_size=32768}, [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15239] <... sigaltstack resumed>NULL) = 0 [pid 15238] <... clone resumed>child_stack=0xc00014e000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 15240 strace: Process 15240 attached [pid 15239] rt_sigprocmask(SIG_SETMASK, [], [pid 15238] rt_sigprocmask(SIG_SETMASK, [], [pid 15240] gettid( [pid 15239] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15240] <... gettid resumed>) = 15240 [pid 15238] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15240] arch_prctl(ARCH_SET_FS, 0xc000125090 [pid 15239] gettid( [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15238] futex(0x672fe0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15240] <... arch_prctl resumed>) = 0 [pid 15239] <... gettid resumed>) = 15239 [pid 15240] sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0 [pid 15239] epoll_pwait(7, [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15240] sigaltstack({ss_sp=0xc00015c000, ss_flags=0, ss_size=32768}, [pid 15239] <... epoll_pwait resumed>[], 128, 0, NULL, 362006) = 0 [pid 15240] <... sigaltstack resumed>NULL) = 0 [pid 15239] futex(0xc000124148, FUTEX_WAKE_PRIVATE, 1 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15240] rt_sigprocmask(SIG_SETMASK, [], [pid 15239] <... futex resumed>) = 1 [pid 15236] <... futex resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15240] <... rt_sigprocmask resumed>NULL, 8) = 0 [pid 15239] futex(0xc000124d48, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15236] futex(0xc000124d48, FUTEX_WAKE_PRIVATE, 1 [pid 15240] gettid( [pid 15239] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable) [pid 15236] <... futex resumed>) = 0 [pid 15240] <... gettid resumed>) = 15240 [pid 15239] epoll_pwait(7, [pid 15236] futex(0xc000124148, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15240] kill(0, SIGUSR2 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15239] <... epoll_pwait resumed>[], 128, 0, NULL, 362006) = 0 [pid 15240] <... kill resumed>) = 0 [pid 15239] --- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=15228, si_uid=1001} --- [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15239] futex(0x672fe0, FUTEX_WAKE_PRIVATE, 1 [pid 15240] epoll_pwait(7, [pid 15230] <... read resumed>"0\0\0\0$\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 48 [pid 15240] <... epoll_pwait resumed>[], 128, 0, NULL, 362006) = 0 [pid 15239] <... futex resumed>) = 1 [pid 15238] <... futex resumed>) = 0 [pid 15230] futex(0xc000030548, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15239] rt_sigreturn({mask=[]} [pid 15238] futex(0xc000124948, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15240] read(6, [pid 15239] <... rt_sigreturn resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15239] futex(0xc000124948, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 15238] <... futex resumed>) = 0 [pid 15239] futex(0x672fe0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15238] kill(0, SIGUSR2 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15238] <... kill resumed>) = 0 [pid 15229] --- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=15228, si_uid=1001} --- [pid 15238] epoll_pwait(7, [], 128, 0, NULL, 362006) = 0 [pid 15230] <... futex resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [pid 15229] futex(0x672fe0, FUTEX_WAKE_PRIVATE, 1 [pid 15230] futex(0xc000030548, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15229] <... futex resumed>) = 1 [pid 15239] <... futex resumed>) = 0 [pid 15229] rt_sigreturn({mask=[]}) = 0 [pid 15239] futex(0xc000124d48, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15229] futex(0xc000124d48, FUTEX_WAKE_PRIVATE, 1 [pid 15239] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable) [pid 15238] write(1, "fuse: <- Interrupt [ID=0x3 Node="..., 62 [pid 15229] <... futex resumed>) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15239] futex(0xc000030548, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 15239] futex(0x672fe0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15238] <... write resumed>) = 62 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15230] <... futex resumed>) = 0 [pid 15230] kill(0, SIGUSR2) = 0 [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15230] epoll_pwait(7, [pid 15229] --- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=15228, si_uid=1001} --- [pid 15238] write(1, "fuse: -> [ID=0x3] Interrupt\n", 28 [pid 15229] futex(0x672fe0, FUTEX_WAKE_PRIVATE, 1 [pid 15230] <... epoll_pwait resumed>[], 128, 0, NULL, 362006) = 0 [pid 15239] <... futex resumed>) = 0 [pid 15229] <... futex resumed>) = 1 [pid 15239] futex(0xc000124d48, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15238] <... write resumed>) = 28 [pid 15230] epoll_pwait(7, [pid 15229] rt_sigreturn({mask=[]} [pid 15238] write(1, "interrrupted 0x2\n", 17 [pid 15230] <... epoll_pwait resumed>[], 128, 0, NULL, 362006) = 0 [pid 15229] <... rt_sigreturn resumed>) = 0 [pid 15230] futex(0xc000124d48, FUTEX_WAKE_PRIVATE, 1 [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15239] <... futex resumed>) = 0 [pid 15238] <... write resumed>) = 17 [pid 15230] <... futex resumed>) = 1 [pid 15239] kill(0, SIGUSR2 [pid 15230] futex(0x672fe0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15239] <... kill resumed>) = 0 [pid 15238] write(1, "fuse: -> [ID=0x2] Getattr error="..., 38 [pid 15231] <... futex resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [pid 15239] --- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=15228, si_uid=1001} --- [pid 15231] futex(0xc000030948, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15238] <... write resumed>) = 38 [pid 15230] <... futex resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15239] futex(0x672fe0, FUTEX_WAKE_PRIVATE, 1 [pid 15238] write(6, "\20\0\0\0\374\377\377\377\2\0\0\0\0\0\0\0", 16 [pid 15230] futex(0x672fe0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15229] nanosleep({tv_sec=0, tv_nsec=20000}, [pid 15239] <... futex resumed>) = 0 [pid 15238] <... write resumed>) = 16 [pid 15230] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable) [pid 15228] <... newfstatat resumed>0xc000122038, 0) = -1 EINTR (Interrupted system call) [pid 15239] rt_sigreturn({mask=[]} [pid 15238] futex(0xc000030548, FUTEX_WAKE_PRIVATE, 1 [pid 15230] futex(0xc000030548, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15239] <... rt_sigreturn resumed>) = 0 [pid 15238] <... futex resumed>) = 0 [pid 15230] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable) [pid 15229] <... nanosleep resumed>NULL) = 0 [pid 15228] futex(0x643e28, FUTEX_WAIT_PRIVATE, 0, NULL [pid 15230] write(1, "error: stat ./test: interrupted "..., 44 ```
networkimprov commented 4 years ago

cc @tv42

ianlancetaylor commented 4 years ago

@fastest963 Thanks. The strace output shows that the newfstatat syscall is being interrupted, and is returning EINTR. My takeaway is that when using a FUSE file system essentially any system call that touches the file system can return EINTR. Unfortunately I don't see a reasonable way to add a test case for a FUSE file system to the standard library, though I would be happy to hear suggestions (these suggestions can not involve importing a big package like https://github.com/bazil/fuse into the standard library just for testing purposts).

I guess I'll just look for syscall functions in the os package and add EINTR loops. Then we'll see what happens.

gopherbot commented 4 years ago

Change https://golang.org/cl/249178 mentions this issue: os, internal/poll: loop on EINTR for all file syscalls

tv42 commented 4 years ago

A non-FUSE alternative for testing would be seccomp or ptrace triggering that EINTR. Or a special slower build where RawSyscall etc check a flag and never even enter kernelspace. Of course both of those need to be somehow targeted to the thing under test, matching by fd or pathname or kernel thread id.

And yes once networking etc are added to the mix, just about anything can return EINTR, because they mostly have no other choice -- allowing user to control-C out of slow networking is a nice thing to have, so not bailing out isn't really an option. Only a precious few syscalls have something else meaningful to report, so EINTR it is.

If Go's scheduler using SIGURG triggers this logic, that could lead to a nasty scenario where the work is started from scratch on every round, and always interrupted by SIGURG. I sure hope the SIGURG doesn't happen too often.

networkimprov commented 4 years ago

In that case, maybe loop-on-EINTR logic should be disabled via an os.OpenFile() flag. That would apply to previously added EINTR loops, too.

We need loops to keep existing code working with async-preemption, and also need to accommodate network filesystems.

EDIT: this also arose with CIFS: #39237, #38836

bcmills commented 4 years ago

os.OpenFile has never specified its behavior w.r.t. EINTR.

If you need to open a file with the possibility of interruption or cancellation, an explicit context.Context seems like the way to do it. (A third-party package could implement that functionality using the syscall package; adding that API to os seems like a separate proposal, but one that should wait for the abstract-filesystem proposal #5636 to settle.)

networkimprov commented 4 years ago

I've added a comment on the FS API thread re network filesystems and EINTR loops: https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_draft_design/g24rusp/

tv42 commented 4 years ago

I don't think there should be a special flag to os.OpenFile; the default and only behavior should be "do the right thing" (whatever that ends up meaning).

I posit the two requirements for developer and user comfort here are:

  1. Control-C causing SIGINT must allow programs in e.g. io.Copy to do an orderly shutdown (with whatever logic the app wants; not a runtime panic, not just the default signal handler, though of course SIGINT default handling applies until overriden), assuming the read(2)/write(2) system call they were hanging in returns with EINTR. (That is, this feature may need per-filesystem kernel support, which can be assumed to be there.)

  2. Programs running in normal circumstances should not see spurious read/write/etc failures.

The way to differentiate between cases 1 and 2 is the SIGINT, not the EINTR.

networkimprov commented 4 years ago

Assuming a reliable method to determine the source signal for every occurrence of EINTR, on every OS that returns it :-)

Programs may expect EINTR for more than SIGINT, so you'd retry on EINTR when the cause was SIGURG.

ianlancetaylor commented 4 years ago

We don't need to know what signal caused the EINTR, and we don't need special treatment on a per-file basis. Go is always multi-threaded. A SIGINT will either cause the program to exit (the default behavior) or the signal will be reported on a channel (the behavior if the program uses os/signal.Notify). This is entirely independent of the code that makes the system call that returns EINTR.

We always send SIGURG to a specific thread to preempt that thread. But we only need to preempt threads that are actually running Go code. We never need to preempt threads that are waiting in a system call (because the scheduler already ignores them). So there should be no risk of sending SIGURG so frequently that a system call never completes. Also, we already have code to prevent signaling a thread too frequently; if it is necessary for some reason, we can improve that code to ensure that the thread makes progress.

tv42 commented 4 years ago

I'm not sure you can just retry on all EINTR and call it a day. That means one can't write a SIGINT handler that does an orderly shutdown, because it can't claw back control of those threads that are in syscalls.

If you leave it at this, all such shutdown is done by essentially abandoning the goroutines that are hanging in syscalls. I don't think this is a good idea, and I don't think it's always doable safely! The hang might have been e.g. io.Copy(os.Stdout, aNetworkFS) and we might output "Control-C received, shutting down..." after which the io.Copy manages to do some work. I'm sure there are scenarios where that ugliness presents as corrupted state and not just confusing messages.

I believe that programs should be able to receive SIGINT, un-hang those syscalls that can be un-hung, and do an orderly shutdown. There are plenty of examples of this in the C world. The typical convention is that a second SIGINT while trying to do an orderly shutdown makes the program crash harder.

I posit that the only way to achieve "orderly shutdown" is by letting the syscalls complete (hopefully with EINTR asap; one may need to nudge them that way!), and then let the calling code deal with the consequences, with the top level waiting on something like errgroup until everything is shut down.

A single-threaded C program would set a global flag ("shutting down") in a signal handler, and then the EINTR loop would actually be if errno==EINTR && !shutting_down, letting the EINTR through after a SIGINT.

A Go program doesn't even get to have signal handlers that are guaranteed to run before the EINTR loop goes back to making the syscall hang. And that's before we even consider multithreading, where the hanging syscall(s) are in separate threads from where the signal is handled!

A multithreaded program will likely end up getting the SIGINT on thread different from the thread that's hanging on a syscall. Hopefully the scheduler will keep something available to run a goroutine to actually handle the signal, these are not considered "blocking syscalls" afaik!

This multi-threaded aspect is making me think there's basically only one path forward to clean shutdowns:

Some sort of universal support for cancellation interruption, where the signal.Notify channel receiver can tell the hanging syscalls to bail out and then main can wait for them to have done so. This could tie into io.Reader taking context https://github.com/golang/go/issues/20280 but here we have much more than Read and Write at stake.

To make this happen, the hanging syscalls can be woken up with a signal (assuming kernel support again), and EINTR is a perfectly fine error in such a scenario. A multi-threaded program would have to "broadcast" the incoming signal (handled by just one thread) to all of its threads, to wake up all stuck syscalls.

Here's a strawman proposal: Add function to make all hanging syscalls wake up with EINTR (where kernel support exists), exactly once.

os/signal.InterruptAll() causes every goroutine that is in a syscall to receive an unspecified signal and disables the EINTR retry loop exactly once for each for them.

(It may also send signals to more threads than that, just to make implementation simpler. Goroutines not in syscalls would not observe anything special happening; pretty much exactly like the scheduler SIGURG code path.)

These signals will not be seen specifically by signal.Notify nor prevented by signal.Ignore. (Probably just by reusing SIGURG as the signal.)

Construct a mechanism for InterruptAll to disable EINTR retry loops exactly once, so we don't wreck the continuing existence of the program with spurious EINTRs. Something like: Have a per-goroutine noRetryOnEINTR flag, always clear it on entry to syscall, InterruptAll atomically sets that for target threads, then sends SIGURG to them.

ianlancetaylor commented 4 years ago

Thanks for the comment. However, I don't think what you are suggesting is a good fit for the Go world. Go programs have a managed runtime. One consequence is that they can in effect crash at any time, for example if the runtime is unable to allocate memory or unable to start a new thread. There is no reliable way to enforce the orderly shutdown of a Go program. This is an intentional choice made by the language and runtime.

For cases where an orderly shutdown is essential, Go requires separating the program into separate processes. For example, the parent process can run the real program in a child process and then, if the child process exits in some unpredictable way, restore the state to some acceptable status.

tv42 commented 4 years ago

I think crash safety and orderly shutdown are two different things. I'm a huge proponent of even crash-only software overall, but there are cases where an orderly shutdown just makes sense. I think we can keep "user asks to end program" quite separate from "out of memory". There's no way to enforce the orderly shutdown of anything computerized, but that doesn't mean all hope is lost for the common case.

Here's a recent example of graceful shutdown functionality added to stdlib: https://golang.org/pkg/net/http/#Server.Shutdown

networkimprov commented 4 years ago

@tv42 maybe you could paste your previous comments into a new proposal issue? We'd get more eyes on it that way.

bcmills commented 4 years ago

@tv42, we have an existing mechanism for tearing down interrupted calls. That mechanism is context.Context, for which there is an already-accepted proposal to simplify signal plumbing (#37255).

If we had a way to plumb the Context into filesystem operations, then that could be used for both the SIGINT shutdown use-case and many others, and could be also be supported by in-process APIs that do not compile down to system calls. However, an InterruptAll call would not address these more general Context use-cases. Therefore, we should pursue the general Context plumbing rather than the specific SIGINT/EINTR mechanism.

ianlancetaylor commented 4 years ago

Yes, the advantage of using context.Context is that it puts the decision of how to handle a termination request in the only place it can safely go, which is the code that is going to be terminated. We don't want to force all Go code that makes system calls to have to handle termination requests. And, to be clear, relying on interrupting a system call would indeed force that: otherwise, for example, we will have ten thousand worker goroutines all log a message about a system call failure.

tv42 commented 4 years ago

@bcmills That would require plumbing Context to an lstat(2) (and many others) that hangs in kernelspace, including sending a signal to get the syscall to return. I haven't seen anyone talk about doing that.

ianlancetaylor commented 4 years ago

@tv42 I don't see a reason to worry about whether the actual system call returns or not. That doesn't matter with regard to an orderly shutdown. It's fine for the program to exit with a bunch of threads stuck in system calls. The kernel will clean them up.

What matters for an orderly shutdown is whether the goroutines complete their logical shutdown. And that can be done, in principle, with a context.Context.

networkimprov commented 4 years ago

Sorry, I must have missed something... How does a Context terminate a goroutine waiting on a syscall that won't return?

bcmills commented 4 years ago

@networkimprov, an API that accepts a Context could take any number of approaches.

For example, it could start the system call on a separate thread-locked goroutine, use the syscall or x/sys APIs directly instead of going through the os package in order to handle EINTR directly, and send the thread a signal to interrupt it when the Context is canceled. That might require reimplementing substantial parts of the runtime's poller, and it might be less efficient without runtime and/or standard-library cooperation, but it is at least conceptually implementable as a third-party library.

Even if it does require standard-library or runtime cooperation, the thing to propose is “an os-like API that supports context cancellation”, not EINTR errors from the ostensibly-platform-agnostic os package.

tv42 commented 4 years ago

@ianlancetaylor Abandoning goroutines means the shutdown has to race against those threads (they might not be hanging, after all!), which can make e.g. neat closing of a database impossible.

networkimprov commented 4 years ago

How does a Context terminate a goroutine waiting on a syscall that won't return?

For example ...

[translation] It doesn't.

We obviously wouldn't specify a syscall constant in a proposal to let Go apps see "interrupted" errors.

EDIT: And would we really mirror the whole os file API with an alternative API taking Context arguments? Should context be a dependency of os or the new FS API package?

ianlancetaylor commented 4 years ago

@tv42 Yes, that is likely true.

Let me put it this way: the chance that we are going to add significant complexity to the way that system calls are handled in the Go standard library, solely in order to support cleaner shutdowns, is near zero. From the very start the attitude of the Go team has been that orderly shutdowns can't be implemented with 100% reliability, therefore there is no point to implementing them with 99% reliability. Anybody who needs 100% reliability needs to use a separate process. Since that case is already required and is already feasible, Go is not going to take any extra effort to serve the people who only need 99% reliability.

@networkimprov Not that this is necessarily going to get implemented, but see #20280.

networkimprov commented 4 years ago

There's a more important case than orderly shutdown: interrupting a stalled syscall that tried I/O with a network filesystem. You need to be able to stop that without quitting the app. Is there any means other than a signal? os.File deadlines only apply to File.Read/Write().

Until the EINTR loops added since async preemption, EINTR from a network filesystem was visible. A channel-based signal "handler" sets a global variable and re-raises the signal. The filesystem op checks that variable on EINTR. Isn't this a valuable pattern?

Re #20280, its thumbs are +30 -125, so that one seems like a long shot :-)

ianlancetaylor commented 4 years ago

EINTR from a network file system was visible in some cases, and it was reported as a bug. E.g., #38836. I'm not aware of any program that uses the pattern that you describe.

More generally, it sounds like the problem you are trying to solve is "some way to put a deadline on a file system operation." If that is what we want, let's talk about that, not about the low level detail of EINTR errors.

networkimprov commented 4 years ago

Deadlines could be a solution; where should we discuss them? I already posted a note on the FS API discussion. (That's gone quiet, as there's apparently no way to subscribe to an entire Reddit thread.)

But I asked this Q: "Is there any means [to stop a network filesystem op] other than a signal"? Is there? (And I noted above that an "interrupted" error from os wouldn't specify syscall.EINTR.)

EINTR was reported as a bug when it appeared on 1.14 via SIGURG. That app hadn't yet discovered the need to halt a stalled CIFS op :-)

ianlancetaylor commented 4 years ago

I don't know where to discuss deadlines other than golang-nuts, as the idea seems somewhat vague to me. I also don't know how often it comes up. In the modern era it's hard for me to picture anybody writing a program that both requires fully reliable cleanup and uses a networked file system.

networkimprov commented 4 years ago

The need for deadlines is not about program cleanup; I stated that above.

A program relying on a network needs deadlines in case the network or other side fails. Network filesystems are only accessible via the stdlib filesystem API, which provides deadlines for File.Read/Write(). It's missing them for Stat(), Readdir(), Open(), etc.

ianlancetaylor commented 4 years ago

Sorry for confusing this with the cleanup discussion.

The vast majority of programs have no idea that they are running on a networked file system, nor should they. The point of a networked file system is to hide these matters.

So it seems to me that the only programs that are going to use a deadline for a stat call are going to be ones that are aware that they are running on a networked file system. And I argue that in the modern era those programs are much more likely to use a different approach, rather than use a networked file system at all.

So I find that I'm unclear on which programs in practice will take advantage of the capability to use a deadline for a stat call.

networkimprov commented 4 years ago

I don't follow your logic. You pointed to deadlines because handling interrupted APIs is too low level. Now you suggest that the API needn't have deadlines because it should hide any underlying network. Finally, the lack of interrupts and deadlines can't be a problem because no one knowingly calls network filesystems in Go programs.

We've had three bug reports re network filesystems since 1.14. Cloud services offer NAS to be mounted locally. There's a Go-native FUSE module. The FS API proposal allows any conceivable resource to be treated as a file tree. Your hypothesis isn't well supported :-)

Re which programs need a deadline for Stat(); which programs need one for File.Read/Write()?

I offered two possible solutions on the Reddit FS API thread -- including a deadline that applies to all ops rooted at a certain path. EDIT: However, the simplest solution is an "interrupted" error from file APIs.

There's a reason we see EINTR from network filesystems even tho the signal handler set SA_RESTART. Looping on EINTR no matter the cause is incorrect.

ianlancetaylor commented 4 years ago

I don't follow your logic. You pointed to deadlines because handling interrupted APIs is too low level. Now you suggest that the API needn't have deadlines because it should hide any underlying network. Finally, the lack of interrupts and deadlines can't be a problem because no one knowingly calls network filesystems in Go programs.

I don't think I ever pointed to deadlines. Other people brought them up, and I tried to respond in context.

Then I thought about it more and questioned what kind of program would use a deadline for a function like os.Stat. I'm not saying it is meaningless. I'm saying: what programmer would write a program that uses a deadline for os.Stat? It's an honest question.

You ask who would set deadlines on read and write, and the answer of course is that many programs that use network connections want to use deadlines, because they are aware that the other side of the connection is another process that may be on another machine. But most people writing os.Stat aren't thinking of that possibility.

All the bug reports we've seen about networked file systems returning an EINTR error are reporting that case as a bug. So I don't see why you say that "looping on EINTR no matter the cause is incorrect." In fact it appears to often be correct. EINTR is a distraction, as I tried to say above. If we care about networked file systems, EINTR is not the answer. If we don't care about networked file systems, EINTR is not the answer. In Go, as far as I can tell, it's never the answer. It's only useful in a single-threaded program.

There are issues we can discuss about stopping system calls, or setting deadlines on them, but in those discussions EINTR is still not going to be the answer.

networkimprov commented 4 years ago

Since SIGURG yields EINTR, I imagined sending some other signal to certain threads to interrupt them, but Go has no API to signal threads! Oops, I've been thinking in Posix terms.

To restate the case, a file API that tries I/O with a CIFS or FUSE filesystem may become stalled, and need to be stopped. Do you agree that stopping it should be supported?

If so, and a deadline is not the best mechanism, what is?

Re "programs that use network connections want to use [File.Read/Write] deadlines", I'm glad you recognize that Go programs do knowingly use network filesystems :-)

A program that sets a deadline on File.Read() (knowing a network is involved) would do the same for Open(), Stat(), or Readdir() on that file. In what scenario would it do otherwise?

ianlancetaylor commented 4 years ago

To restate the case, a file API that tries I/O with a CIFS or FUSE filesystem may become stalled, and need to be stopped. Do you agree that stopping it should be supported?

I agree that a program that is specifically designed for use with CIFS or FUSE should support some mechanism for stopping the file operations one way or another.

If so, and a deadline is not the best mechanism, what is?

I think that a deadline is a fine mechanism.

Re "programs that use network connections want to use [File.Read/Write] deadlines", I'm glad you recognize that Go programs do knowingly use network filesystems :-)

The editorial insertion changes the context of what I wrote to give it a different meaning than I intended. I wasn't talking about File.Read/Write deadlines on network file systems. I was talking about deadlines as supported by net.Conn and also os.File. For os.File deadlines are most useful for pipes. And in general we tried to make the API as similar as possible for net.Conn and os.File.

I clearly have failed to explain what I am trying to say, so I will try again. If we add a deadline form for os.Stat, who is going to use it? Normal programs will never use it; why should they? The only programs that will ever use the deadline form of os.Stat will be programs that are concerned about the possibility of running on a networked file system. I am claiming that for any program that is concerned specifically about a networked file system, there is a better way of writing that program that doesn't use a networked file system. That was not true once, but I claim that it is true today.

The point of networked file systems is to permit ordinary programs, written with ordinary file system calls, to access files on remote computers. If your program already knows that it is running on a networked file system, then it is not an ordinary program.

So if you think that it will help us to add a deadline variant of os.Stat, I am asking for examples of a program that would use that variant. Not theoretical possibilities, because clearly such a program is theoretically possible. I'm asking for a real case where someone would sit down to write a program and say "I want to call os.Stat here and I had better set a deadline on that call." Because of that person is saying "I expect this program to run on a networked file system" then I claim that that person would not be using file system calls at all; they would use some sort of client/server setup instead.

networkimprov commented 4 years ago

Thanks for elaborating that File.Read/Write() deadlines are intended for pipes.

I pointed to some examples of contemporary network filesystem use four comments back. I hoped they would be persuasive.

I'm building a Go app for Windows, MacOS, and Linux that makes heavy use of the filesystem. If someone runs it on a network volume, I would like it to handle network/fileserver stalling. Currently there's no stdlib or runtime support for that.

Allowing interrupt of file syscalls trying certain paths would address that. I don't need deadlines per se.

ianlancetaylor commented 4 years ago

How would allowing the file system calls to be interrupted help you? What would interrupt them? Presumably you would not want the preemption signals to interrupt them? But how would that work?

You can already write a stat call with a deadline by writing something like

func StatWithDeadline(name string, t time.TIme) (os.FileInfo, error) {
    type res struct { fi os.FileInfo; err error }
    c := make(chan res, 1)
    go func() {
        fi, err := os.Stat(name)
        c <- res{fi, err}
    }()
    select {
    case r := <-c:
        return r.fi, r.err
    case <-time.After(time.Until(t)):
        return nil, os.ErrDeadlineExceeded
    }
}

This approach doesn't interrupt the system call itself, but the program can do the right thing if the stat call fails. Does it matter that the stat call waits until the file system implementation times it out?

networkimprov commented 4 years ago

Interrupting a stalled file syscall is necessary if the app will retry it, esp if it's an update. (Of course, it checks whether the update attempt succeeded after interrupting it.)

An Interrupt API would take a path argument and (on unix) presumably signal any threads trying that path, causing EINTR. Interruptions due to preemption signals are OK, since the app knows whether it has requested interruption, and can loop if not. The Interrupt API could consult a table of pathnames and thread IDs to know which threads to signal.

qingyunha commented 4 years ago

I encounter a stabled readdir using nfs just now.

Seems there no way to interrupting the syscall without kill the whole program.

SIGQUIT: quit
PC=0x4567f1 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x5ccb28, 0x80, 0x0, 0x0, 0x100000000, 0x0, 0x0, 0x7ffd00000000, 0x7ffd43922d28, 0x40a03f, ...)
        /usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0x5ccb28, 0xc000000000, 0xffffffffffffffff)
        /usr/local/go/src/runtime/os_linux.go:44 +0x46
runtime.notesleep(0x5ccb28)
        /usr/local/go/src/runtime/lock_futex.go:151 +0x9f
runtime.stopm()
        /usr/local/go/src/runtime/proc.go:1934 +0xc0
runtime.findrunnable(0xc000020000, 0x0)
        /usr/local/go/src/runtime/proc.go:2397 +0x53f
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2530 +0x2be
runtime.park_m(0xc000000a80)
        /usr/local/go/src/runtime/proc.go:2616 +0x9d
runtime.mcall(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [syscall, 36 minutes]:
syscall.Syscall(0xd9, 0x3, 0xc0002ea000, 0x2000, 0x140, 0x140, 0x42997a)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.Getdents(0x3, 0xc0002ea000, 0x2000, 0x2000, 0x7f46e25716d0, 0x0, 0xc000043938)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:465 +0x5a
syscall.ReadDirent(...)
        /usr/local/go/src/syscall/syscall_linux.go:863
internal/poll.(*FD).ReadDirent(0xc0003bc420, 0xc0002ea000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:416 +0xe4
os.(*File).readdirnames(0xc0003be2d0, 0x14, 0x0, 0x0, 0xc0003be2d0, 0x0, 0x0)
        /usr/local/go/src/os/dir_unix.go:50 +0x1b8
os.(*File).Readdirnames(...)
        /usr/local/go/src/os/dir.go:48
filexw.walk(0xc0000c8f00, 0x4, 0xc000043cc8, 0xc000043cb8)
        /root/filex_guidang/walk.go:36 +0x6ce
filexw.walk(0x503ca8, 0x1, 0xc000043cc8, 0xc000043cb8)
        /root/filex_guidang/walk.go:72 +0x55d
filexw.(*Project).RunOldFilex(0xc0000abec8)
        /root/filex_guidang/walk.go:339 +0x3fd
main.main()
        /root/filex_guidang/cmd/walk/main.go:36 +0x35f

rax    0xca
rbx    0x5cc9e0
rcx    0xffffffffffffffff
rdx    0x0
rdi    0x5ccb28
rsi    0x80
rbp    0x7ffd43922cf0
rsp    0x7ffd43922ca8
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x3
r13    0x7f46e2571008
r14    0x8
r15    0x8
rip    0x4567f1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
networkimprov commented 4 years ago

Yes, Linux NFS once had the ability to return EINTR on signal, but mysteriously removed it.

bcmills commented 4 years ago

@ianlancetaylor

The only programs that will ever use the deadline form of os.Stat will be programs that are concerned about the possibility of running on a networked file system. I am claiming that for any program that is concerned specifically about a networked file system, there is a better way of writing that program that doesn't use a networked file system. … If your program already knows that it is running on a networked file system, then it is not an ordinary program.

I don't think the second point follows from the first. The programs that would use os.Stat with cancellation and/or a deadline are those that can be used with either an ordinary filesystem or a networked on and are designed to be agnostic between those two configurations. (For example, desktop or command-line programs that transcode or otherwise manipulate large files, which may be stored locally or on a shared filesystem.)

If a program does not know whether it may be running on a networked file system, then it must use ordinary file system calls (because it may be on an ordinary file system), but it must also provide for canceling stalled operations (because it may be on a networked filesystem).

The StatWithDeadline implementation that you propose above — which abandons a long-running goroutine — is not suitable for such a program, because it can leak the resources associated with the call indefinitely, and thus cause an entire program to crash instead of isolating failure to the specific filesystem operation in question. An API with proper cancellation support should not leak resources (beyond the next GC cycle) after the call has been canceled.

tv42 commented 4 years ago

Does it matter that the stat call waits until the file system implementation times it out?

There's no guarantee a file system ever times out. FUSE by default will happily sit right there until the FS client decides to cancel (e.g. SIGINT causing syscall to return with EINTR). Filesystems adding their own timeouts leads to ugly, ugly things -- pretty much everyone using it at the time learned that with NFS.

ianlancetaylor commented 4 years ago

@qingyunha See the os/signal package for how to handle signals.

@bcmills I don't think I agree with your suggestion that programs that care about networked file systems will just always use the file system calls. It isn't the right approach. I'm sure some such programs exist, but I'm not convinced as to how much we should cater to them.

That said, I would not be opposed to a version of StatWithDeadline that sends a signal to the thread to interrupt the stat call. It would not be easy to write, because of the race conditions. And I'm not sure the standard library is the right place for it. And, of course, this has nothing to do with EINTR. StatWithDeadline still wouldn't use or return EINTR.

tv42 commented 4 years ago

Can we please switch from thinking purely about deadlines to thinking about cancellation -- not at a predefined time, but on user action. General cancellation can support deadlines, deadlines cannot support general cancellation. (Unless you count on cancellation setting deadline to a past time, but that's just an unnecessarily confusing API.)