go-delve / delve

Delve is a debugger for the Go programming language.
MIT License
23.07k stars 2.15k forks source link

Hard to repro bug: screen locks sometimes #3605

Closed thockin closed 10 months ago

thockin commented 11 months ago
  1. What version of Delve are you using (dlv version)?

Delve Debugger Version: 1.21.1 Build: $Id: a358c02f24aa7047ecc562b0587dc2d08330b2cf $

  1. What version of Go are you using? (go version)?

go version go1.21.3 linux/amd64

  1. What operating system and processor architecture are you using?

linux amd64

  1. What did you do?

Ran dlv, hit ^Z

  1. What did you expect to see?

My terminal

  1. What did you see instead?

Frozen screen, no input or output. I had to killall -9 dlv from another terminal, at which point all of the buffered input was sent to shell! And I had to reset the terminal.

I can't 100% repro this, but it happened several times yesterday. Not sure how to debug it.

thockin commented 11 months ago

It keeps happening - any clues on how to debug it would be helpful.

aarzilli commented 11 months ago

Are you hitting ^Z while the program you are debugging is running or at delve's prompt? If you look at the output of ps what does it say about dlv? Can you use delve to attach to the stuck dlv process and see what it's doing?

thockin commented 11 months ago

^Z while in dlv

 ps auxw | grep dlv
thockin  3468313  1.5  0.1 6322216 83956 pts/0   Tl+  15:07   0:00 dlv debug ./staging/src/k8s.io/code-generator/cmd/defaulter-gen -- --v 5 --logtostderr -h hack/boilerplate/boilerplate.generatego.txt -O zz_generated.defaults -i ./cmd/kubeadm/app/apis/kubeadm/v1beta3 -i ./cmd/kubeadm/app/apis/kubeadm/v1beta4 -i ./pkg/apis/abac/v1beta1 -i ./pkg/apis/admission/v1 
$ dlv attach 3468313
Could not attach to pid 3468313: this could be caused by a kernel security setting, try writing "0" to /proc/sys/kernel/yama/ptrace_scope

sudo ~thockin/go/bin/dlv attach 3468313
Type 'help' for list of commands.

(dlv) grs
  Goroutine 1 - User: /home/thockin/go/pkg/mod/github.com/go-delve/liner@v1.2.3-0.20220127212407-d32d89dd2a5d/input.go:140 github.com/go-delve/liner.(*State).readNext (0x9537be) [select]
  Goroutine 2 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [force gc (idle) 1631637529423568]
  Goroutine 3 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [GC sweep wait]
  Goroutine 4 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [GC scavenge wait]
  Goroutine 5 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [finalizer wait 1631637529423568]
  Goroutine 6 - User: /home/thockin/go/pkg/mod/github.com/go-delve/delve@v1.21.2/pkg/proc/native/proc.go:400 github.com/go-delve/delve/pkg/proc/native.(*ptraceThread).handlePtraceFuncs (0x892457) [chan receive]
  Goroutine 7 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [debug call]
  Goroutine 13 - User: /home/thockin/sdk/gotip/src/net/pipe.go:159 net.(*pipe).read (0x565a35) [select]
  Goroutine 15 - User: /home/thockin/sdk/gotip/src/runtime/internal/syscall/syscall_linux.go:38 syscall.RawSyscall6 (0x407ced) (thread 3468317)
  Goroutine 18 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [debug call]
  Goroutine 19 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [debug call]
  Goroutine 20 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [debug call]
  Goroutine 21 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [select 1631637748260353]
  Goroutine 34 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [debug call]
  Goroutine 35 - User: /home/thockin/sdk/gotip/src/runtime/proc.go:403 runtime.gopark (0x44090e) [debug call]
  Goroutine 51 - User: /home/thockin/sdk/gotip/src/net/pipe.go:159 net.(*pipe).read (0x565a35) [select]
  Goroutine 53 - User: /home/thockin/sdk/gotip/src/runtime/sigqueue.go:152 os/signal.signal_recv (0x46f929) (thread 3468319)
  Goroutine 55 - User: /home/thockin/go/pkg/mod/github.com/go-delve/delve@v1.21.2/pkg/terminal/terminal.go:186 github.com/go-delve/delve/pkg/terminal.(*Term).sigintGuard (0x99f4bd) [chan receive 1631637748260353]
[18 goroutines]

Here's my config, in case it matters:

(dlv) config -list
aliases                   map[display:[disp] down:[do] help:[?]]
substitute-path           []
max-string-len            256
max-array-values          4
max-variable-recurse      2
disassemble-flavor        <not defined>
show-location-expr        false
source-list-line-color    "\x1b[0;33m"
source-list-arrow-color   "\x1b[1;36m"
source-list-keyword-color "\x1b[0;34m"
source-list-string-color  "\x1b[0;35m"
source-list-number-color  "\x1b[0;32m"
source-list-comment-color "\x1b[0;36m"
source-list-tab-color     "\x1b[2;37m"
source-list-line-count    10
debug-info-directories    [/usr/lib/debug/.build-id]
position                  ""
tab                       "... "
trace-show-timestamp      false

It seems to happen 100% now for me, but when I sud dlv attach'ed I could ^Z ok.

aarzilli commented 11 months ago

The 'T' status looks fine, it means dlv did suspend in response to the signal. What's not good is the '+', at that point the shell should take over the terminal. What's the environment here: shell, terminal, any terminal multiplexes involved, ssh, is any of this being started by a script?

thockin commented 11 months ago

It is a local terminal, no SSH but it IS being started by a script. That is an excellent observation. Let me try to eliminate that (exec).

On Mon, Dec 25, 2023, 11:01 PM Alessandro Arzilli @.***> wrote:

The 'T' status looks fine, it means dlv did suspend in response to the signal. What's not good is the '+', at that point the shell should take over the terminal. What's the environment here: shell, terminal, any terminal multiplexes involved, ssh, is any of this being started by a script?

— Reply to this email directly, view it on GitHub https://github.com/go-delve/delve/issues/3605#issuecomment-1869312797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVGO3BK52FJNJSRILE3YLJY5JAVCNFSM6AAAAABBAEIDL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZGMYTENZZG4 . You are receiving this because you authored the thread.Message ID: @.***>

thockin commented 11 months ago

hah, that does seem to trigger it. I admit I don't know enough about how the terminal SHOULD react to say if this is right or not. I tested by putting vi in a script and ^Zing that works how I would expect (the whole shell suspends).

$ cat vi.sh 
#!/bin/sh
echo before
vi "$@"
echo after

$ echo $SHLVL
1

$ sh vi.sh /tmp/a
before
<vi loaded, I hit ^Z>
[1]+  Stopped                 sh vi.sh /tmp/a

$ echo $SHLVL
1

$ fg
<vi reloaded, I exited>
sh vi.sh /tmp/a
after

Is that a fair analog? I think so but not 100% sure.

aarzilli commented 11 months ago

We're sending SIGTSTP to ourselves when we should be sending it to our process group. The workaround for this should be to have an actual exec in the script (exec dlv debug ...)

thockin commented 11 months ago

Yes, exec fixes the problem for now. Thanks for digging in.