runtime: ThreadSanitizer failed to allocate / CHECK failed

sb10 commented 4 years ago

What version of Go are you using (`go version`)?

$ go version
go version go1.14 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/nfs/users/nfs_s/sb10/.cache/go-build"
GOENV="/nfs/users/nfs_s/sb10/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/nfs/users/nfs_s/sb10/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/software/vertres/installs/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/software/vertres/installs/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/nfs/users/nfs_s/sb10/src/go/github.com/VertebrateResequencing/wr/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build484982129=/tmp/go-build -gno-record-gcc-switches"

$ free -m
              total        used        free      shared  buff/cache   available
Mem:         135359       19962       78221          28       37175      114236
Swap:        223231          38      223193

What did you do?

git clone https://github.com/sb10/wr.git
cd wr
git checkout ThreadSanitizer
CGO_ENABLED=1 go test -p 1 -tags netgo -race --count 1 -gcflags=all=-d=checkptr=0 ./jobqueue -run TestJobqueueRunners -v

What did you expect to see?

Tests should pass cleanly the same way they do without race (go test -p 1 -tags netgo --count 1 ./jobqueue -run TestJobqueueRunners -v).

What did you see instead?

Variations on:

ERROR: ThreadSanitizer failed to allocate 0x28000 (163840) bytes of sync allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cpp:7064 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)

Which exits the test. It happens during a seemingly random test each attempt.

Additional info:

As mentioned, no problems with race detection disabled.
Problem also happens with go1.13.8, didn't check other versions.
Problem goes away if I revert usage of github.com/sasha-s/go-deadlock back to sync, but I want to run go-deadlock in production to catch deadlock bugs that my tests aren't finding.

As a bad interaction between go-deadlock and the go race detector, I have no idea how to debug this.

toothrot commented 4 years ago

/cc @aclements @rsc @randall77 @dvyukov

dvyukov commented 4 years ago

It seems that you run out of memory on the machine. Errno 12 is ENOMEM. Maybe you disable overcommit/swap/set ulimit -v/use memcg.

sb10 commented 4 years ago

It turns out I only have this problem on a machine where processes have a 5GB per process limit enforced.

$ ulimit -m
5000000

Though LSF reports that peak memory when using race is less than 2GB, average 980MB. Peak memory without race is reported as 800MB, average 300MB.

Is this difference in memory usage expected? Could there be a very brief >5GB heap allocation with race that LSF doesn't detect in its peak memory usage report?

dvyukov commented 4 years ago

What exactly is that -m? Are you sure you restrict and measure the same memory? RSS? Virtual? Allocated? Locked? Accounted? There are lots of them :)

sb10 commented 4 years ago

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 541143
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) 5000000
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 541143
virtual memory          (kbytes, -v) 5000000
file locks                      (-x) unlimited

So -m is "max memory size". I don't know how LSF measures memory usage. But in any case, it seems that a lot more memory is being used than normal, that it hits this limit.

dvyukov commented 4 years ago

Memory use increase under race detector is very much expected, see: https://golang.org/doc/articles/race_detector.html#Runtime_Overheads So I would say increase/don't set the limits. If you set limits, at some point normal program will crash as well. It's not possible to achieve that any program works with any limits.

sb10 commented 4 years ago

Sure, I expect some memory usage increase, the question is, is this much of an increase a possible bug? Is there a bad interaction between go-deadlock and the go race detector that uncovers some unexpected run-away memory usage? Or is the memory usage legitimate?

If there's no easy way to answer this question, I guess this issue can be closed.

dvyukov commented 4 years ago

Sure, I expect some memory usage increase, the question is, is this much of an increase a possible bug?

This has some reference numbers: https://golang.org/doc/articles/race_detector.html#Runtime_Overheads Is memory consumption increase way above these numbers?

Is there a bad interaction between go-deadlock and the go race detector that uncovers some unexpected run-away memory usage?

I don't know. You are filing the bug, so I assume you have answers :) Does memory usage grow infinitely under race detector whereas it does not grow in normal build?

benitogf commented 3 years ago

hello, I'm seeing this issue too on the github actions runner: https://github.com/benitogf/level/runs/2772001793?check_suite_focus=true only on windows though

redirected here: https://github.com/golang/go/issues/22553 since the error code is different (1455)

golang / go