golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.82k stars 17.51k forks source link

runtime: TestLldbPython fails for lldb 7.0.0 #29244

Closed jimrobinson closed 2 years ago

jimrobinson commented 5 years ago

What version of Go are you using (go version)?

$ go version
go version devel +a6293995c5 Thu Dec 13 23:58:06 2018 +0000 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jimr/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jimr/lib/go:/home/jimr/proj/github:/usr/local/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build101053857=/tmp/go-build -gno-record-gcc-switches"

This is a Fedora 29 machine, it has lddb 7.0.0-1 and python2 2.7.15 installed:

$ dnf list --installed python2 lldb python2-lldb
Installed Packages
lldb.x86_64                           7.0.0-1.fc29                      @fedora 
python2.x86_64                        2.7.15-11.fc29                    @updates
python2-lldb.x86_64                   7.0.0-1.fc29                      @fedora 

What did you do?

$ cd /usr/local/go/src $ git rev-parse HEAD a6293995c55659f51e0662e7656f395633c99b5b $ export GOROOT_BOOTSTRAP=/usr/local/lib/go1.11.2 $ ./all.bash

What did you expect to see?

...
ALL TESTS PASSED
---
Installed Go for linux/amd64 in /usr/local/go
Installed commands in /usr/local/go/bin

What did you see instead?

...
--- FAIL: TestLldbPython (2.33s)
    runtime-lldb_test.go:180: Unexpected lldb output:
        Created target
        Created breakpoint
        Process launched
FAIL
FAIL    runtime 20.873s
...
2018/12/13 16:38:53 Failed: exit status 1

It appears that the lldb script.py is not returning the expected output:

$ /usr/local/go/bin/go build "-gcflags=all=-N -l" "-ldflags=-compressdwarf=false" -o a.exe

$ /usr/bin/python2.7 script.py /usr/lib64/python2.7/site-packages
Created target
Created breakpoint
Process launched

Digging it, this section of script.py is failing because the stop reason is not the expected value:

32     if state == lldb.eStateStopped:
33       for t in process.threads:
34         if t.GetStopReason() == lldb.eStopReasonBreakpoint:

I added some debugging:

 diff -u script.py script.debug.py 
--- script.py   2018-12-13 16:12:27.539701062 -0800
+++ script.debug.py 2018-12-13 16:33:50.672528818 -0800
@@ -5,6 +5,19 @@

 TIMEOUT_SECS = 5

+eStopReason = {
+    lldb.eStopReasonNone: "lldb.eStopReasonNone",
+    lldb.eStopReasonTrace: "lldb.eStopReasonTrace",
+    lldb.eStopReasonBreakpoint: "lldb.eStopReasonBreakpoint",
+    lldb.eStopReasonWatchpoint: "lldb.eStopReasonWatchpoint",
+    lldb.eStopReasonSignal: "lldb.eStopReasonSignal",
+    lldb.eStopReasonException: "lldb.eStopReasonException",
+    lldb.eStopReasonExec: "lldb.eStopReasonExec",
+    lldb.eStopReasonPlanComplete: "lldb.eStopReasonPlanComplete",
+    lldb.eStopReasonThreadExiting: "lldb.eStopReasonThreadExiting",
+    lldb.eStopReasonInstrumentation: "lldb.eStopReasonInstrumentation"
+}
+
 debugger = lldb.SBDebugger.Create()
 debugger.SetAsync(True)
 target = debugger.CreateTargetWithFileAndArch("a.exe", None)
@@ -44,6 +57,8 @@
               print "intvar = %s" % (var.GetValue(),)
             else:
               print "no intvar"
+        else:
+          print t
+          print eStopReason[t.GetStopReason()]
     else:
       print "Process state", state
     process.Destroy()

and this produces output indicating "Signal" and "None" stopped reasons:

$ /usr/bin/python2.7 script.debug.py /usr/lib64/python2.7/site-packages
Created target
Created breakpoint
Process launched
thread #2: tid = 4968, 0x00000000004592b3 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #3: tid = 4967, 0x0000000000458d1d a.exe`runtime.usleep at sys_linux_amd64.s:131, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #4: tid = 4969, 0x00000000004592b3 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #5: tid = 4970, 0x00000000004592b3 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #6: tid = 1, 0x00000000004a0070 a.exe`main.main at main.go:11
lldb.eStopReasonNone
thread #7: tid = 2, 0x000000000042dbc4 a.exe`runtime.gopark(unlockf=, lock=, reason=, traceEv=, traceskip=) at proc.go:302
lldb.eStopReasonNone
thread #8: tid = 3, 0x000000000042dbc4 a.exe`runtime.gopark(unlockf=, lock=, reason=, traceEv=, traceskip=) at proc.go:302
lldb.eStopReasonNone
thread #9: tid = 4, 0x000000000042dbc4 a.exe`runtime.gopark(unlockf=, lock=, reason=, traceEv=, traceskip=) at proc.go:302
lldb.eStopReasonNone

TestLldbPython.debug.tar.gz

agnivade commented 5 years ago

Related to https://github.com/golang/go/issues/22299 which fails for older lldb versions. But this one seems to fail even for new ones.

bcmills commented 5 years ago

CC @thanm @cherrymui @ianlancetaylor

thanm commented 5 years ago

Hmm. I poked at this for a while but never go to the point where the test could run on my machine (some bad combination of python, lldb, and python-lldb bindings config I assume). I get:

File "/usr/lib/python2.7/dist-packages/lldb/__init__.py", line 45, in swig_import_helper return importlib.import_module('_lldb') File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module __import__(name) ImportError: No module named _lldb

This is with "sudo apt-get install lldb-7 python-lldb-7" to install the basics. Is there a trick that I am missing?

jimrobinson commented 5 years ago

@thanm Given the name apt-get, I assume you're on a Debian or Ubuntu system? Which revision? I'm on Fedora 29 so I'll need to spin up a VM if I am to try and replicate the issue on another distro.

thanm commented 5 years ago

Yes, I am using a Debian variant. Thanks for the offer of additional testing, but I'd like to discuss the test with a couple of my team-mates first. Looking at the test it's not clear to me exactly what it is supposed to be verifying. If the goal is only to check whether lldb can set a breakpoint on the toy program, then it might better just to run lldb in batch mode as opposed to via the python interface. Stay tuned.

thanm commented 5 years ago

@jimrobinson I did manage to get my version 7 lldb + python-lldb installation issues resolved, so I can run test locally, including your debug script -- however I can't seem to reproduce the same bad behavior on my end (I do see the correct stop reason). I'll poke at it some more later today.

jimrobinson commented 5 years ago

@thanm thank you. So can we verify our respective setups?

Linux 4.20.0-0.rc6
Python 2.7.15
lldb version 7.0.0

The libraries python2 links against:

glibc-2.28-23.fc29.x86_64
python2-libs-2.7.15-11.fc29.x86_64

The libraries lldb links against:

clang-libs-7.0.0-2.fc29.x86_64
glibc-2.28-23.fc29.x86_64
libedit-3.1-24.20170329cvs.fc29.x86_64
libffi-3.1-18.fc29.x86_64
libgcc-8.2.1-5.fc29.x86_64
libstdc++-8.2.1-5.fc29.x86_64
libxml2-2.9.8-4.fc29.x86_64
lldb-7.0.0-1.fc29.x86_64
llvm-libs-7.0.0-2.fc29.x86_64
ncurses-libs-6.1-8.20180923.fc29.x86_64
python2-libs-2.7.15-11.fc29.x86_64
xz-libs-5.2.4-3.fc29.x86_64
zlib-1.2.11-14.fc29.x86_64
bcmills commented 5 years ago

@thanm, should this be milestoned to 1.12, 1.13, or Unplanned?

thanm commented 5 years ago

At this point I am 99% sure it is a problem with the test, not with the Go compiler or DWARF. I'll put it in 1.13

siebenmann commented 5 years ago

This didn't used to happen to me in the same environment (Fedora 29 with lldb available), but something changed recently and I'm now getting this exact failure. A potentially relevant piece of information is that I recently applied a Fedora update that updated llvm but not lldb. It was apparently a minor rebuild that only changed the RPM version number (7.0.1-1 to 7.0.1-2), but maybe something gets perturbed even then.

siebenmann commented 5 years ago

After some more investigation, I've established that what causes this test to start failing for me is switching from Fedora' earlier kernels (up through 4.19.15) to their 4.20.x kernel (either 4.20.3 or 4.20.4 so far). In 4.20.x, the Python script sees lldb stopping at the correct line but with reason eStopReasonNone. This also happens for a version of the test program built with Go 1.11, not just the latest go tip version, and also with Go 1.10 if I remove the go build option to disable compressed DWARF. So I think this is a general issue on modern Linux kernels, possibly a general one that lldb has with anything that uses threads (I haven't tested a threaded C program). Delve appears to be able to do breakpoints properly even on 4.20.x kernels on this test program.

Running lldb by hand on a.exe with a breakpoint set at the same place (with list main.main and then break set -l 10) may also be failing. When I do this, lldb stops, but it only displays four threads that have stopped with SIGSTOP, all of which appear to be internal Go threads:

* thread #2, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536
[...]
  thread #3, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131
[...]
  thread #4, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536
[...]
  thread #5, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131

The actual hit breakpoint only shows up if I list all threads:

(lldb) thread list
Process 30484 stopped
* thread #2: tid = 30488, 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
  thread #3: tid = 30487, 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131, name = 'a.exe', stop reason = signal SIGSTOP
  thread #4: tid = 30489, 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
  thread #5: tid = 30490, 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131, name = 'a.exe', stop reason = signal SIGSTOP
  thread #6: tid = 1, 0x00000000004a0b28 a.exe`main.main at main.go:10
  thread #7: tid = 2, 0x000000000042dc34 a.exe`runtime.gopark(unlockf=<unavailable>, lock=<unavailable>, reason=<unavailable>, traceEv=<unavailable>, traceskip=<unavailable>) at proc.go:302
  thread #8: tid = 3, 0x000000000042dc34 a.exe`runtime.gopark(unlockf=<unavailable>, lock=<unavailable>, reason=<unavailable>, traceEv=<unavailable>, traceskip=<unavailable>) at proc.go:302
  thread #9: tid = 17, 0x0000000000417d10 a.exe`runtime.runfinq at mfinal.go:161

If I cont this in lldb, it pauses twice more before printing the hi, but at no point does lldb think that we've stopped in main(), and after the first time thread list says that nothing is even running in main() any more.

Using the same compiled a.exe binary on 4.19.15 with lldb, everything works exactly as it should; lldb immediately stops at the right point and shows what you'd want:

(lldb) break set -l 10
Breakpoint 1: where = a.exe`main.main + 392 at main.go:10, address = 0x00000000004a0b28
(lldb) run
Process 3678 launched: '/tmp/t/a.exe' (x86_64)
Process 3678 stopped
* thread #6, stop reason = breakpoint 1.1
    frame #0: 0x00000000004a0b28 a.exe`main.main at main.go:10
   7            mapvar["ghi"] = "jkl"
   8            intvar := 42
   9            ptrvar := &intvar
-> 10           fmt.Println("hi") // line 10
   11           _ = ptrvar
   12   }

(Since I ran into this myself, a note for people trying to reproduce this: if you extract the main.go program by hand from runtime/runtime-lldb_test.go, note that it needs to have a blank line at the top or all line numbers will be off by one.)

gopherbot commented 2 years ago

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)