golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.57k stars 17.61k forks source link

valgrind fails #60600

Open erezgeva opened 1 year ago

erezgeva commented 1 year ago

What version of Go are you using (go version)?

$ go version
go version go1.15.15 linux/amd64

Does this issue reproduce with the latest release?

Probably

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
v
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/erez/.cache/go-build"
GOENV="/home/erez/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/erez/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/erez/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.15"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.15/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build273804324=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Use valgrind

What did you expect to see?

To pass

What did you see instead?

Errors.

Why is that a problem?

I saw this comment about valgrind: Valgrind is really a tool for C programs. It assumes it can replace malloc with its own copy, it assumes the standard C stack model, I'm sure it assumes other things specific to that world too. This is perfectly right.

But I do not run valgrind to check the go application. I run valgrind to check C/C++ libraries that are linked with my go application! And as they are libraries, they are linked differently with my go application. So running valgrind with a different language, would not provide the same result.

Python provide the PYTHONMALLOC=malloc. I would except something similar with Go language. See: https://docs.python.org/3/c-api/memory.html And: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONMALLOC

dr2chase commented 1 year ago

Go doesn't support valgrind, but does support asan, msan, and tsan, which work with clang, and perhaps gcc.

erezgeva commented 1 year ago

Go doesn't support valgrind, but does support asan, msan, and tsan, which work with clang, and perhaps gcc.

Yes, I notice. That is why I open the ticket. GCC support AddressSanitizer. As go itself does not need memory checker, it does not matters. The AddressSanitizer is a good tool. But it does have a downside. It require compilation. Yes, valgrind is slow, but it use the same production binary. So no compilation 'errors' due to different compilations flags or optimisation can skip. And yes, C/C++ compilers are good, and yet there are corners for the developers.

ianlancetaylor commented 1 year ago

What happens if you run valgrind on a Go program linked with C libraries? I would expect it to check the memory behavior of the C code.

erezgeva commented 1 year ago

What happens if you run valgrind on a Go program linked with C libraries? I would expect it to check the memory behavior of the C code.

Exactly, and without errors from go part, as it does not really need it.

I can give an example This is my test app for a C++ library linked from a go application. $ ldd go/gtest/gtest linux-vdso.so.1 (0x00007ffc0d6ff000) libptpmgmt.so.1 => not found ...

Now I run it with valgrind ==1937133== Conditional jump or move depends on uninitialised value(s) ==1937133== at 0x4A0139: runtime.adjustframe (/usr/lib/go-1.15/src/runtime/stack.go:550) ==1937133== by 0x4AB634: runtime.gentraceback (/usr/lib/go-1.15/src/runtime/traceback.go:334) ==1937133== by 0x4A0A06: runtime.copystack (/usr/lib/go-1.15/src/runtime/stack.go:910) ==1937133== by 0x4A0DCC: runtime.newstack (/usr/lib/go-1.15/src/runtime/stack.go:1076) ==1937133== by 0x4B97CE: runtime.morestack (/usr/lib/go-1.15/src/runtime/asm_amd64.s:449) ==1937133== by 0x4B9533: runtime.rt0_go (/usr/lib/go-1.15/src/runtime/asm_amd64.s:220) ==1937133== by 0x57EB6F: ??? (in /home/erez/docs/develop/libptpmgmt/go/gtest/gtest) ...

==1937133== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)

I am obviously not interested in errors in go, but can you assure me that my C++ library is cleared?

I run valgrind with: Lua, php, perl and Python. All pass without any errors.

ianlancetaylor commented 1 year ago

Thanks. This will need attention from somebody familiar with valgrind. The memory is initialized, but valgrind doesn't seem to know that. I don't know why.

laboger commented 1 year ago

I don't know if this will help but I've used valgrind a few times with Go on ppc64le, and I usually have to build the Go binary with external linking to get it to work. I'm not sure why, maybe valgrind depends on something in __libc_start_main. Since you are seeing runtime.rt0_go I believe that means it was built with the internal linker.

erezgeva commented 1 year ago

I don't know if this will help but I've used valgrind a few times with Go on ppc64le, and I usually have to build the Go binary with external linking to get it to work. I'm not sure why, maybe valgrind depends on something in __libc_start_main. Since you are seeing runtime.rt0_go I believe that means it was built with the internal linker.

That might be that, then perhaps go should provide instructions? I did not found the 2 stages, I use go build for both compilation and linking.

ianlancetaylor commented 1 year ago

If your Go package use cgo, which is the normal way for Go code to call C code, then you will be using external linking by default.

erezgeva commented 1 year ago

I do understand your point. The valgrind error comes from go code, not from my C++ code. And as far as I understand , we can use extern in both directions. C can call go functions and go can all C. We only need proper external, translate the types between C and go, and finally build. Anyhow, there should be a proper instructions with simple example for using valgrind . For using go with C code and valgrind.

laboger commented 1 year ago

Ian's point is that my suggestion won't help.

gopherbot commented 1 year ago

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

erezgeva commented 1 year ago

I have no idea what additional information you need. If you wish to close the ticket, you should be more explicit for why.

ianlancetaylor commented 1 year ago

Reopening. But as we said before, this is going to require attention from somebody familiar with valgrind.

erezgeva commented 1 year ago

Sure, but please explain, what do you need? Adding tags is nice. But I think you should explain. We can ask for help from Valgrid. Or try to write a small test, that demonstrate the problem and can be easily reproduced. How do you want to proceed?

randall77 commented 1 year ago

We need someone familiar with valgrind who is willing to work on this issue. A step-by-step reproducer would definitely help that person.

midepeter commented 10 months ago

@randall77 @erezgeva

Hello,

I am a golang engineer and interested in performance engineering. I am currently studing how valgrind works and i am willing to conduct a research on this details and give more detailed report about how this can be solved if given the go ahead.

Thanks. Looking forward to your response.

randall77 commented 10 months ago

A step-by-step reproducer would definitely help that person.

This is probably the place to start.

Then we need to understand why valgrind is reporting errors on Go code. Then we need to figure out how to either fix or suppress those errors.

midepeter commented 10 months ago

Oh yes.

I am going to start with trying to reproduce the errors as you said.

Thanks @randall77

midepeter commented 10 months ago

Hello @randall77

I started investigating this issue by trying to reproduce the error as you said. I tried running a couple of simple programs that involved cgo. I noticed a consistent result in the valgrind report LEAK SUMMARY: ==11520== definitely lost: 0 bytes in 0 blocks ==11520== indirectly lost: 0 bytes in 0 blocks ==11520== possibly lost: 1,440 bytes in 5 blocks ==11520== still reachable: 0 bytes in 0 blocks ==11520== suppressed: 0 bytes in 0 blocks

The number of possibly lost bytes seems consistent across the basic programs i ran. Also, i want to note that valgrind does not report errors on pure go code without call to the cgo library. The possibly lost leak starts out at the call of the C library.

I just wanted to drop my progress on this as I am still trying to investigate why the possibly lost bytes are from either you'd love to drop some comments or you are expecting more targeted results. Mind you the possibly lost bytes are the ones that are difficult to the valgrind framework to define if it were directly lost or indirectly lost. The above results were from the basic a basic program.

arg0d commented 9 months ago

Running Valgrind with a Go program is very relevant for my use case, where my Go program does multiple cgo calls for alloc/free, and I want to ensure that these calls are correct, i.e. no memory leaks and no double frees.

midepeter commented 9 months ago

@arg0d Interesting. What were your results when you ran valgrind on your Go program.

arg0d commented 9 months ago

Running valgrind go test on linux

==55396== Memcheck, a memory error detector
==55396== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==55396== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==55396== Command: go test -v
==55396== 
==55396== Warning: set address range perms: large range [0x9160000, 0x29160000) (noaccess)
==55396== Warning: set address range perms: large range [0x29160000, 0x49160000) (noaccess)
==55396== Warning: ignored attempt to set SIGRT32 handler in sigaction();
==55396==          the SIGRT32 signal is used internally by Valgrind
==55396== Warning: client switching stacks?  SP change: 0x1ffefffc28 --> 0xc00007a7d8
==55396==          to suppress, use: --max-stackframe=687212047280 or greater
==55396== Warning: client switching stacks?  SP change: 0xc00007a730 --> 0x1ffefffcb8
==55396==          to suppress, use: --max-stackframe=687212046968 or greater
==55396== Thread 3:
==55396== Invalid read of size 8
==55396==    at 0x46D005: runtime.systemstack.abi0 (in /usr/local/go/bin/go)
==55396==  Address 0xc00007b6b0 is in a rw- anonymous segment
==55396== 
==55396== Invalid read of size 8
==55396==    at 0x40900E: runtime.send.goready.func1 (in /usr/local/go/bin/go)
==55396==    by 0x46D009: runtime.systemstack.abi0 (in /usr/local/go/bin/go)
==55396==  Address 0xc00007b6b8 is in a rw- anonymous segment
==55396== 
==55396== Invalid read of size 8
==55396==    at 0x409012: runtime.send.goready.func1 (in /usr/local/go/bin/go)
==55396==    by 0x46D009: runtime.systemstack.abi0 (in /usr/local/go/bin/go)
==55396==  Address 0xc00007b6c0 is in a rw- anonymous segment
==55396== 
==55396== Use of uninitialised value of size 8
==55396==    at 0x46D040: runtime.systemstack.abi0 (in /usr/local/go/bin/go)
==55396==    by 0x408D99: runtime.chansend (in /usr/local/go/bin/go)
==55396==    by 0x408896: runtime.chansend1 (in /usr/local/go/bin/go)
==55396==    by 0x428710: runtime.bgsweep (in /usr/local/go/bin/go)
==55396==    by 0x41D8E4: runtime.gcenable.func1 (in /usr/local/go/bin/go)
==55396==    by 0x46EE00: runtime.goexit.abi0 (in /usr/local/go/bin/go)
==55396== 
==55396== Use of uninitialised value of size 8
==55396==    at 0x408FA5: runtime.send (in /usr/local/go/bin/go)
==55396==    by 0x408896: runtime.chansend1 (in /usr/local/go/bin/go)
==55396==    by 0x428710: runtime.bgsweep (in /usr/local/go/bin/go)
==55396==    by 0x41D8E4: runtime.gcenable.func1 (in /usr/local/go/bin/go)
==55396==    by 0x46EE00: runtime.goexit.abi0 (in /usr/local/go/bin/go)
...
a bunch more errors
...
==55396== More than 1000 different errors detected.  I'm not reporting any more.
==55396== Final error counts will be inaccurate.  Go fix your program!
==55396== Rerun with --error-limit=no to disable this cutoff.  Note
==55396== that errors may occur in your program without prior warning from
==55396== Valgrind, because errors are no longer being displayed.