elves / elvish

Powerful scripting language & versatile interactive shell
https://elv.sh/
BSD 2-Clause "Simplified" License
5.63k stars 299 forks source link

Hang trying to read bytes from a pipe filled with values #1576

Open hanche opened 2 years ago

hanche commented 2 years ago

Run the following code:

echo $pid; range 40 | slurp

Result: The shell hangs, and does not respond to SIGINT. If you send a SIGSEGV (this is why we recorded the shell's pid), a backtrace is printed. It is probably informative, but I don't include it here, since it is rather big (and elvish users can reproduce it themselves). Not tried on Windows.

(Possibly, you may need to replace 40 by a bigger number.)

Edited: Much simpler code to reproduce, thanks to @krader1961 in chat.

krader1961 commented 2 years ago

The limit is defined by this constant: const pipelineChanBufferSize = 32. There's no comment in pkg/eval/compile_effect.go for why 32 and not some other value. In any case, it should be obvious it isn't sufficient to simply use a larger value. This hangs: elvish -c 'echo $pid; range 33 | slurp'. Sending SIGUSR1 to the process results in these backtraces:

goroutine 5 [running]:
src.elv.sh/pkg/sys.DumpStack()
    /Users/krader/projects/3rd-party/elvish/pkg/sys/dumpstack.go:10 +0x66
src.elv.sh/pkg/shell.handleSignal({0x43e2bd0?, 0x45bc810?}, {0x43e0948, 0xc0000a4010})
    /Users/krader/projects/3rd-party/elvish/pkg/shell/signal_unix.go:20 +0x8d
src.elv.sh/pkg/shell.initSignal.func1()
    /Users/krader/projects/3rd-party/elvish/pkg/shell/shell.go:140 +0x75
created by src.elv.sh/pkg/shell.initSignal
    /Users/krader/projects/3rd-party/elvish/pkg/shell/shell.go:137 +0x91

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x4e83158, 0x72)
    /usr/local/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00007c420?, 0xc00024a000?, 0x1)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc00007c420, {0xc00024a000, 0x200, 0x200})
    /usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
os.(*File).read(...)
    /usr/local/go/src/os/file_posix.go:31
os.(*File).Read(0xc000010098, {0xc00024a000?, 0x4068eb9?, 0x4281bfe?})
    /usr/local/go/src/os/file.go:119 +0x5e
io.ReadAll({0x43e0928, 0xc000010098})
    /usr/local/go/src/io/io.go:645 +0xfe
src.elv.sh/pkg/eval.slurp(0x2?)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/builtin_fn_io.go:637 +0x37
reflect.Value.call({0x43094e0?, 0x438a778?, 0x18?}, {0x4371a35, 0x4}, {0xc00000e348, 0x1, 0x0?})
    /usr/local/go/src/reflect/value.go:556 +0x845
reflect.Value.Call({0x43094e0?, 0x438a778?, 0x42ed5a0?}, {0xc00000e348, 0x1, 0x1})
    /usr/local/go/src/reflect/value.go:339 +0xbf
src.elv.sh/pkg/eval.(*goFn).Call(0xc0000d3810, 0xc000234180, {0x0, 0x0, 0x7?}, 0xc000020960)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/go_fn.go:247 +0xb46
src.elv.sh/pkg/eval.(*formOp).exec(0xc0002144d0, 0xc000234180)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:366 +0x45d
src.elv.sh/pkg/eval.(*pipelineOp).exec.func1({0x43e0c28?, 0xc0002144d0?}, 0xc0000361b0)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:120 +0x5a
src.elv.sh/pkg/eval.(*pipelineOp).exec(0xc00007a380, 0xc000234000)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:134 +0xa8f
src.elv.sh/pkg/eval.chunkOp.exec({{0x0, 0x1b}, {0xc000036100, 0x2, 0x2}}, 0xc00000e0c0?)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:32 +0x110
src.elv.sh/pkg/eval.nsOp.prepare.func1()
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compiler.go:81 +0x26
src.elv.sh/pkg/eval.(*Evaler).Eval(0xc00020e000, {{0x43744bc, 0xc}, {0x7ffeefbff5bb, 0x1b}, 0x1}, {{0xc00000e0c0, 0x3, 0x3}, 0x438a390, ...})
    /Users/krader/projects/3rd-party/elvish/pkg/eval/eval.go:411 +0x5a4
src.elv.sh/pkg/shell.evalInTTY({0xc0000a4000, 0xc0000a4008, 0xc0000a4010}, 0x4371cbd?, {0x0, 0x0}, {{0x43744bc, 0xc}, {0x7ffeefbff5bb, 0x1b}, ...})
    /Users/krader/projects/3rd-party/elvish/pkg/shell/shell.go:153 +0x1c5
src.elv.sh/pkg/shell.script(0x4316700?, {0xc0000a4000, 0xc0000a4008, 0xc0000a4010}, {0xc00009e0b0?, 0x7ffeefbff5b9?, 0x1?}, 0xc0000ddd65)
    /Users/krader/projects/3rd-party/elvish/pkg/shell/script.go:64 +0x39a
src.elv.sh/pkg/shell.(*Program).Run(0xc00009fb00, {0xc0000a4000, 0xc0000a4008, 0xc0000a4010}, {0xc00009e0b0, 0x1, 0x1})
    /Users/krader/projects/3rd-party/elvish/pkg/shell/shell.go:66 +0x385
src.elv.sh/pkg/prog.composite.Run(...)
    /Users/krader/projects/3rd-party/elvish/pkg/prog/prog.go:118
src.elv.sh/pkg/prog.Run({0xc0000a4000, 0xc0000a4008, 0xc0000a4010}, {0xc00009e090, 0x3, 0x3}, {0x43e2ba8, 0xc0000b0750})
    /Users/krader/projects/3rd-party/elvish/pkg/prog/prog.go:83 +0x3d4
main.main()
    /Users/krader/projects/3rd-party/elvish/cmd/elvish/main.go:18 +0x17f

goroutine 19 [chan receive]:
src.elv.sh/pkg/eval.getBlackholeChan.func1()
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:96 +0x3a
created by src.elv.sh/pkg/eval.getBlackholeChan
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:95 +0x6a

goroutine 20 [syscall]:
os/signal.signal_recv()
    /usr/local/go/src/runtime/sigqueue.go:148 +0x28
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
    /usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 6 [chan receive]:
src.elv.sh/pkg/eval.FilePort.func1()
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:233 +0xca
created by src.elv.sh/pkg/eval.FilePort
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:232 +0xea

goroutine 7 [chan receive]:
src.elv.sh/pkg/eval.FilePort.func1()
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:233 +0xca
created by src.elv.sh/pkg/eval.FilePort
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:232 +0xea

goroutine 8 [select]:
src.elv.sh/pkg/eval.ListenInterrupts.func1()
    /Users/krader/projects/3rd-party/elvish/pkg/eval/interrupts.go:46 +0x95
created by src.elv.sh/pkg/eval.ListenInterrupts
    /Users/krader/projects/3rd-party/elvish/pkg/eval/interrupts.go:42 +0x138

goroutine 9 [select]:
src.elv.sh/pkg/eval.valueOutput.Put({0xc00007c4e0?, 0xc00022a2a0?, 0xc0000603c0?}, {0x42fc200?, 0x45bc820?})
    /Users/krader/projects/3rd-party/elvish/pkg/eval/port.go:280 +0x8e
src.elv.sh/pkg/eval.rangeInt({0xc0000a91a0?, 0x2?, 0x0?}, {0x43e1ac8, 0xc0000b07e0})
    /Users/krader/projects/3rd-party/elvish/pkg/eval/builtin_fn_num.go:805 +0x2b8
src.elv.sh/pkg/eval.rangeFn(0xc000234100, {{0x0?, 0x0?}}, {0xc000121d00?, 0x2?, 0xc0000a9190?})
    /Users/krader/projects/3rd-party/elvish/pkg/eval/builtin_fn_num.go:779 +0x28b
reflect.Value.call({0x43132e0?, 0x438a710?, 0x60?}, {0x4371a35, 0x4}, {0xc0000a63c0, 0x3, 0x30?})
    /usr/local/go/src/reflect/value.go:556 +0x845
reflect.Value.Call({0x43132e0?, 0x438a710?, 0x42f2260?}, {0xc0000a63c0, 0x3, 0x4})
    /usr/local/go/src/reflect/value.go:339 +0xbf
src.elv.sh/pkg/eval.(*goFn).Call(0xc000132770, 0xc000234100, {0xc000121c70, 0x1, 0x7?}, 0xc00009fbc0)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/go_fn.go:247 +0xb46
src.elv.sh/pkg/eval.(*formOp).exec(0xc000214420, 0xc000234100)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:366 +0x45d
src.elv.sh/pkg/eval.(*pipelineOp).exec.func1({0x43e0c28?, 0xc000214420?}, 0xc0000361a0)
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:120 +0x5a
created by src.elv.sh/pkg/eval.(*pipelineOp).exec
    /Users/krader/projects/3rd-party/elvish/pkg/eval/compile_effect.go:136 +0xb4a
krader1961 commented 1 year ago

I've been pondering this problem and don't see any practical solution. The simple example in the problem statement (and similar examples) could, in theory, be detected when compiling the program. However, there is no way in general to detect this problem at compile time. It's not even practical to do so at run time since the the LHS could fill the value pipe faster than it is drained by the RHS. Thus, it isn't enough to simply detect that the pipe (Go channel) is full when trying to inject another value. The only practical solution today is to rely on the user to use only-bytes or only-values.

hanche commented 1 year ago

You're probably right. But then the limitations and dangers of mixing bytes and values on the same pipe should be clearly documented, so users can know exactly what will reliably work and when they risk a hang.

And this could be mitigated by implementing #1577. I just added some alternate implementation suggestions to it.

krader1961 commented 1 year ago

Note that there are two, closely related, pipeline problems:

1) The LHS produces byte/value output that is not consumed by the RHS and exceeds the buffering capacity of the stream (this is the issue in the original problem statement). This results in a "hang"; i.e., deadlock.

2) The LHS produces byte/value output that is not consumed by the RHS and fits within the buffering capacity of the stream. In this case that output is silently discarded.

Note that a byte stream can buffer anywhere from 512 bytes (the original UNIX implementation) to 128 KB (a AT&T SysVR4 STREAMS fifo). The most common buffer size today is 8 KB. The value stream can buffer 32 items as defined by const pipelineChanBufferSize = 32 in pkg/eval/compile_effect.go.

Both of these examples "succeed" and silently discard either the byte or value output of the LHS:

# This silently discards the byte stream output of the LHS.
elvish> { put yes; echo (for _ [(range 8192)] { print x } ) } | take 1
▶ yes
# This silently discards the value stream output of the LHS.
elvish> { put yes; echo (for _ [(range 8)] { print x } ) } | slurp
▶ "xxxxxxxx\n"

Both of these examples "fail" by hanging because either the byte or value output of the LHS exceeds the buffer of the respective pipe:

elvish> { echo yes; range 33 } | slurp
elvish> { put yes; for _ [(range (math:pow 2 22))] { print x } } | take 1

The scare-quotes around "fail" and "succeed" are intentional because it is not clear that either condition should be considered a success or failure. Also, the last example, involving byte stream output, actually "succeeds" but only after a very long time. And that is a subtlety that has to do with how byte stream output is handled.