golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.7k stars 17.49k forks source link

runtime: panics and/or deadlocks on FreeBSD 12.2-RELEASE #43873

Open problame opened 3 years ago

problame commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/cs/.cache/go-build"
GOENV="/home/cs/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/cs/development/golang/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/cs/development/golang"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/cs/go1.15.6"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/cs/go1.15.6/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build804757202=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I maintain zrepl, a tool for ZFS replication written in Go. Some users have been hitting Go runtime panics and/or lock-ups on FreeBSD.

The problems started with reports in July 2020 with a user on a development version of FreeBSD (12-STABLE, it must have been somewhere between 12.1-RELEASE and 12.2-RELEASE). The problem occurred both with Go 1.13.5 and 14.*. At the time I requested to reproduce the problem on an official binary release of FreeBSD but that did not happen.

After I updated my personal FreeBSD server to 12.2-RELEASE, I started to encounter similar issues as reported in July 2020. I have not yet encountered runtime panics. But several lock-ups (for lack of a better term) of the Go runtime. The last of the links above contains a stack trace of a goroutine blocked forever on runtime.newobject (stack obtained using dlv).

Summary of my triaging since July:

It would be very helpful to get a quick explanation of what these panics mean so that I can narrow down my audit of the changes between FreeBSD 12.1-RELEASE and 12.2-RELEASE.

Also, I can offer a tmate or similar to a Go / FreeBSD developer to the system with the locked-up daemon. The lock-up usually occurs after 2-3 days on my system, sometimes sooner, but I can leave it in the locked-up state for a day or two.

Related zrepl issues:

dmgk commented 3 years ago

Just a wild guess, but it look related to https://github.com/golang/go/commit/4d27c4c223ccb7de3876abbac79b58ad9579be1a @problame Can you try reproducing this issue with the recent tip? I can update lang/go-devel to include the above commit for easier testing.

problame commented 3 years ago

@dmgk that seems ... quite concerning, and since zrepl uses pipes like crazy, I wouldn't be surprised if this is the problem.

Is this a regression in Go or something that was always handled incorrectly? I'm asking because I currently try to 'bisect' Go versions by compiling zrepl with older Go versions ans seeing whether the problem occurs. The version that is running right now is built with Go 1.13, I would expect a crash to happen within the next 5 days.

Should I abort that run and build with tip + https://github.com/golang/go/commit/4d27c4c223ccb7de3876abbac79b58ad9579be1a as you told me to, or should we wait and see whether it crashes with Go 1.13?

(I cross-compile from Linux so I don't need the lang/go-devel port to be updated).

dmgk commented 3 years ago

Well, looking at the commit, it seems that the issue goes back to 2009 - https://github.com/golang/go/blame/master/src/runtime/sys_freebsd_amd64.s#L21, so I'd try reproducing with the tip first.

problame commented 3 years ago

Any idea why this would surface only since FreeBSD 12.2 then?

mdtancsa commented 3 years ago

Just a wild guess, but it look related to 4d27c4c @problame Can you try reproducing this issue with the recent tip? I can update lang/go-devel to include the above commit for easier testing.

Hi, I would be happy to try locally if you can update the port with this tip/fix/patch. I was the original reporter. Its fairly easy for me to reproduce. A couple of other details about when I ran into the issue last summer, I added more retention to the server (45K+ snapshots) and more clients replicating. So it would have become more busy as well as moving to RELENG12 at the time on FreeBSD.

dmgk commented 3 years ago

@mdtancsa Thanks, I just updated lang/go-devel to go1.16rc1 a few hours ago, it includes https://github.com/golang/go/commit/4d27c4c223ccb7de3876abbac79b58ad9579be1a.

mdtancsa commented 3 years ago

Thanks!! I am running now the same version I have been using for a while (v0.3.0-rc1) but compiled with golang dev from the ports! Thanks for the quick update.

cpuset -l 0,1,2,3,4,5 -p 79245

Usually running with more than 2 CPU cores will cause a deadlock within a day or so. It should not take too long to find out.

    ---Mike

On 1/28/2021 11:06 AM, Dmitri Goutnik wrote:

@mdtancsa https://github.com/mdtancsa Thanks, I just updated |lang/go-devel| to |go1.16rc1| a few hours ago, it includes 4d27c4c https://github.com/golang/go/commit/4d27c4c223ccb7de3876abbac79b58ad9579be1a.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-769190091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QO7MXBFE5BGJTIJS2DS4GDPXANCNFSM4WP73Z7A.

mdtancsa commented 3 years ago

So far so good! I am allowing all threads to bind to whatever core of the CPU. Normally, I would have had a few deadlocks by now. So its certainly MUCH improved.

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 10778 root 21 0 739M 83M uwait 7 8:05 4.72% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M wait 10 3:03 3.50% zrepl-jan2021{zrepl-jan2021} 10778 root 23 0 739M 83M kqread 1 1:21 3.00% zrepl-jan2021{zrepl-jan2021} 10778 root 24 0 739M 83M uwait 5 2:44 2.87% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 5 2:39 2.75% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 11 3:10 2.27% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 2 2:05 1.70% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 9 2:51 1.45% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 11 3:02 1.27% zrepl-jan2021{zrepl-jan2021} 10778 root 20 0 739M 83M uwait 0 3:02 0.51% zrepl-jan2021{zrepl-jan2021} 10778 root 20 0 739M 83M uwait 8 2:40 0.46% zrepl-jan2021{zrepl-jan2021} 10778 root 52 0 739M 83M uwait 7 0:25 0.34% zrepl-jan2021{zrepl-jan2021} 10778 root 23 0 739M 83M wait 4 2:34 0.28% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 6 3:05 0.10% zrepl-jan2021{zrepl-jan2021} 10778 root 22 0 739M 83M uwait 7 3:11 0.04% zrepl-jan2021{zrepl-jan2021} 10778 root 20 0 739M 83M uwait 0 2:57 0.03% zrepl-jan2021{zrepl-jan2021} 10778 root 21 0 739M 83M uwait 4 2:56 0.02% zrepl-jan2021{zrepl-jan2021} 10778 root 22 0 739M 83M uwait 3 3:00 0.00% zrepl-jan2021{zrepl-jan2021} 10778 root 27 0 739M 83M uwait 8 1:59 0.00% zrepl-jan2021{zrepl-jan2021} 10778 root 20 0 739M 83M uwait 3 1:41 0.00% zrepl-jan2021{zrepl-jan2021} 10778 root 20 0 739M 83M uwait 1 0:00 0.00% zrepl-jan2021{zrepl-jan2021} 10778 root 20 0 739M 83M uwait 9 0:00 0.00% zrepl-jan2021{zrepl-jan2021}

This is still v0.3.0-rc1. If it survives the weekend without a deadlock, I will check out a newer version. Its a bit of work since I have to update all the clients as well. @problame, is there a particular version from the repo I should use ? Or can I just stick with 0.3.1 ?

problame commented 3 years ago

@mdtancsa Chasing this bug the last few weeks we've found a few places in zrepl that could cause deadlocks / data races. In all of those cases it was easy to rule out that they were related to the bug I have reported here. Those patches are in branch problame/develop. But that branch also contains a bump of a bunch of dependencies, including gRPC, which should not be included in your testing.

mdtancsa commented 3 years ago

I dont remember what patches I have been using on that version I have widely deployed, if any :(  I think at this point, its probably best to bring the server and its clients up to date.  Is the branch problame/develop safe enough to run in production do you think ?

    ---Mike

On 1/30/2021 11:56 AM, Christian Schwarz wrote:

@mdtancsa https://github.com/mdtancsa Chasing this bug the last few weeks we've found a few places in zrepl that could cause deadlocks / data races. In all of those cases it was easy to rule out that they were related to the bug I have reported here. Those patches are in branch |problame/develop|. But that branch also contains a bump of a bunch of dependencies, including gRPC, which should not be included in your testing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-770243474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QJ3ZP2G3TEEWMIC22TS4Q233ANCNFSM4WP73Z7A.

problame commented 3 years ago

If you are already running 0.3.1 then I wouldn't worry.

You shouldn't run the Update to protobuf v1.25 and grpc 1.35; bump CI to go1.12 when doing testing for this issue though because a bump in grpc / protobuf might introduce a regression that we could falsely identify as the issue we are debugging here.

So the commit for you is 166e80bb.

If you are not running 0.3.1 yet, just make sure that your grid pruning rules are correct (ref https://github.com/zrepl/zrepl/issues/400 )

mdtancsa commented 3 years ago

The previous version was code from July 2020, but recompiled with lang/go-devl

/usr/local/bin/zrepl-jan2021 version client: zrepl version=v0.3.0-rc1 GOOS=freebsd GOARCH=amd64 Compiler=gc server: zrepl version= GOOS=freebsd GOARCH=amd64 Compiler=gc WARNING: client version != daemon version, restart zrepl daemon

OK, I did into a clean dir

git clone https://github.com/zrepl/zrepl

cd zrepl

git checkout 166e80bb

git switch -c 166e80bb-branch

gmake

GO111MODULE=on go build -mod=readonly  -ldflags "-X github.com/zrepl/zrepl/version.zreplVersion=v0.3.1-14-g166e80b" -o "artifacts/zrepl-freebsd-amd64" go: downloading github.com/spf13/cobra v0.0.2 go: downloading github.com/spf13/pflag v1.0.5 go: downloading github.com/pkg/errors v0.8.1 go: downloading github.com/prometheus/client_golang v1.2.1 go: downloading golang.org/x/net v0.0.0-20190613194153-d28f0bde5980 go: downloading github.com/zrepl/yaml-config v0.0.0-20191220194647-cbb6b0cf4bdd go: downloading github.com/kr/pretty v0.1.0 go: downloading github.com/willf/bitset v1.1.10 go: downloading github.com/fatih/color v1.7.0 go: downloading github.com/gdamore/tcell v1.2.0 go: downloading github.com/problame/go-netssh v0.0.0-20200601114649-26439f9f0dc5 go: downloading github.com/go-logfmt/logfmt v0.4.0 go: downloading github.com/mattn/go-isatty v0.0.8 go: downloading github.com/prometheus/common v0.7.0 go: downloading github.com/kr/text v0.1.0 go: downloading github.com/mattn/go-colorable v0.1.4 go: downloading google.golang.org/grpc v1.17.0 go: downloading github.com/golang/protobuf v1.3.2 go: downloading github.com/google/uuid v1.1.1 go: downloading golang.org/x/sync v0.0.0-20190423024810-112230192c58 go: downloading github.com/go-sql-driver/mysql v1.4.1-0.20190907122137-b2c03bcae3d4 go: downloading github.com/lib/pq v1.2.0 go: downloading github.com/ftrvxmtrx/fd v0.0.0-20150925145434-c6d800382fff go: downloading github.com/theckman/goconstraint v1.11.0 go: downloading golang.org/x/sys v0.0.0-20191026070338-33540a1f6037 go: downloading github.com/sirupsen/logrus v1.4.2 go: downloading gopkg.in/alecthomas/kingpin.v2 v2.2.6 go: downloading github.com/jinzhu/copier v0.0.0-20170922082739-db4671f3a9b8 go: downloading github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4 go: downloading github.com/beorn7/perks v1.0.1 go: downloading github.com/cespare/xxhash/v2 v2.1.0 go: downloading github.com/prometheus/procfs v0.0.5 go: downloading google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8 go: downloading github.com/matttproud/golang_protobuf_extensions v1.0.1 go: downloading github.com/gdamore/encoding v1.0.0 go: downloading github.com/lucasb-eyer/go-colorful v1.0.2 go: downloading github.com/mattn/go-runewidth v0.0.4 go: downloading golang.org/x/text v0.3.0 go: downloading github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751 go: downloading github.com/alecthomas/units v0.0.0-20190717042225-c3de453c63f4

 cd artifacts/

./zrepl-freebsd-amd64 version client: zrepl version=v0.3.1-14-g166e80b go=go1.16rc1 GOOS=freebsd GOARCH=amd64 Compiler=gc server: zrepl version= go= GOOS=freebsd GOARCH=amd64 Compiler=gc WARNING: client version != daemon version, restart zrepl daemon

Before I install this onto the server and push it out to a few dozen clients, is that the build I want ?  Not sure how to force certain versions of the dependencies

    ---Mike

On 1/30/2021 1:13 PM, Christian Schwarz wrote:

If you are already running 0.3.1 then I wouldn't worry.

You shouldn't run the |Update to protobuf v1.25 and grpc 1.35; bump CI to go1.12| when doing testing for this issue though because a bump in grpc / protobuf might introduce a regression that we could falsely identify as the issue we are debugging here.

So the commit for you is |166e80bb|.

If you are not running 0.3.1 yet, just make sure that your |grid| pruning rules are correct (ref zrepl/zrepl#400 https://github.com/zrepl/zrepl/issues/400 )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-770254534, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QM6ZEJ3A7MROV6INHDS4RD35ANCNFSM4WP73Z7A.

problame commented 3 years ago

Dependencies are determined by go.mod, no need to deal with that. The commit is correct, build process seems correct. Please make sure you have looked into https://github.com/zrepl/zrepl/issues/400 (mentioned above).

The other slightly 'risky' commit is 0d96627 which adds additional concurrency in some places. I have not run this in production yet. So you may want to revert it for your deployment. Also we want to remove the little piece of unsafe code for this investigation.

I pushed a branch problame/golang-freebsd-investigation (commit 538382a36bd0c71c679134a09373a98f3d3f4cc4 ) for you to deploy.

mdtancsa commented 3 years ago

On 1/30/2021 3:38 PM, Christian Schwarz wrote:

Dependencies are determined by |go.mod|, no need to deal with that. The commit is correct, build process seems correct. Please make sure you have looked into zrepl/zrepl#400 https://github.com/zrepl/zrepl/issues/400 (mentioned above).

The other slightly 'risky' commit is |0d96627| which adds additional concurrency in some places. I have not run this in production yet. So you may want to revert it for your deployment. Also we want to remove the little piece of |unsafe| code for this investigation.

I pushed a branch |problame/golang-freebsd-investigation| (commit 538382a36bd0c71c679134a09373a98f3d3f4cc4 ) for you to deploy.

Thanks! I will test. I dont think issue #400 impacts me. I have very simple pruning rules, with one prune stanza per client machine.

0{backup4}# git clone https://github.com/zrepl/zrepl -b problame/golang-freebsd-investigation Cloning into 'zrepl'... remote: Enumerating objects: 163, done. remote: Counting objects: 100% (163/163), done. remote: Compressing objects: 100% (133/133), done. remote: Total 8064 (delta 63), reused 72 (delta 30), pack-reused 7901 Receiving objects: 100% (8064/8064), 2.13 MiB | 4.33 MiB/s, done. Resolving deltas: 100% (5361/5361), done. 0{backup4}# cd zrepl 0{backup4}# git checkout 538382a36bd0c71c679134a09373a98f3d3f4cc4 Note: switching to '538382a36bd0c71c679134a09373a98f3d3f4cc4'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example:

  git switch -c

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 538382a DO NOT MERGE: timeoutconn: drop the only use of "unsafe" in the code base 0{backup4}# rm -R /root/go 0{backup4}# gmake GO111MODULE=on go build -mod=readonly  -ldflags "-X github.com/zrepl/zrepl/version.zreplVersion=v0.3.1-11-g538382a" -o "artifacts/zrepl-freebsd-amd64" go: downloading github.com/spf13/pflag v1.0.5 go: downloading github.com/spf13/cobra v0.0.2 go: downloading github.com/prometheus/client_golang v1.2.1 go: downloading github.com/pkg/errors v0.8.1 go: downloading golang.org/x/net v0.0.0-20190613194153-d28f0bde5980 go: downloading github.com/zrepl/yaml-config v0.0.0-20191220194647-cbb6b0cf4bdd go: downloading github.com/fatih/color v1.7.0 go: downloading github.com/gdamore/tcell v1.2.0 go: downloading github.com/kr/pretty v0.1.0 go: downloading github.com/problame/go-netssh v0.0.0-20200601114649-26439f9f0dc5 go: downloading github.com/go-logfmt/logfmt v0.4.0 go: downloading github.com/mattn/go-isatty v0.0.8 go: downloading github.com/willf/bitset v1.1.10 go: downloading github.com/prometheus/common v0.7.0 go: downloading github.com/mattn/go-colorable v0.1.4 go: downloading github.com/kr/text v0.1.0 go: downloading github.com/ftrvxmtrx/fd v0.0.0-20150925145434-c6d800382fff go: downloading github.com/theckman/goconstraint v1.11.0 go: downloading golang.org/x/sys v0.0.0-20191026070338-33540a1f6037 go: downloading google.golang.org/grpc v1.17.0 go: downloading github.com/golang/protobuf v1.3.2 go: downloading github.com/google/uuid v1.1.1 go: downloading golang.org/x/sync v0.0.0-20190423024810-112230192c58 go: downloading github.com/go-sql-driver/mysql v1.4.1-0.20190907122137-b2c03bcae3d4 go: downloading github.com/lib/pq v1.2.0 go: downloading github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4 go: downloading github.com/beorn7/perks v1.0.1 go: downloading github.com/cespare/xxhash/v2 v2.1.0 go: downloading github.com/prometheus/procfs v0.0.5 go: downloading github.com/jinzhu/copier v0.0.0-20170922082739-db4671f3a9b8 go: downloading github.com/sirupsen/logrus v1.4.2 go: downloading gopkg.in/alecthomas/kingpin.v2 v2.2.6 go: downloading github.com/matttproud/golang_protobuf_extensions v1.0.1 go: downloading github.com/gdamore/encoding v1.0.0 go: downloading github.com/lucasb-eyer/go-colorful v1.0.2 go: downloading github.com/mattn/go-runewidth v0.0.4 go: downloading golang.org/x/text v0.3.0 go: downloading google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8 go: downloading github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751 go: downloading github.com/alecthomas/units v0.0.0-20190717042225-c3de453c63f4 0{backup4}# artifacts/zrepl-freebsd-amd64 version client: zrepl version=v0.3.1-11-g538382a go=go1.16rc1 GOOS=freebsd GOARCH=amd64 Compiler=gc server: zrepl version= go= GOOS=freebsd GOARCH=amd64 Compiler=gc WARNING: client version != daemon version, restart zrepl daemon 0{backup4}#

mdtancsa commented 3 years ago

On 1/30/2021 3:38 PM, Christian Schwarz wrote:

Dependencies are determined by |go.mod|, no need to deal with that. The commit is correct, build process seems correct. Please make sure you have looked into zrepl/zrepl#400 https://github.com/zrepl/zrepl/issues/400 (mentioned above).

The other slightly 'risky' commit is |0d96627| which adds additional concurrency in some places. I have not run this in production yet. So you may want to revert it for your deployment. Also we want to remove the little piece of |unsafe| code for this investigation.

I pushed a branch |problame/golang-freebsd-investigation| (commit 538382a36bd0c71c679134a09373a98f3d3f4cc4 ) for you to deploy.

Hi,

I did an upgrade to the server and clients and got a crash not too long after it started

fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x473b6f]

goroutine 307891 [running]: runtime.throw(0xca6dbe, 0x2a)         /usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc00016aa80 sp=0xc00016aa50 pc=0x4397d2 runtime.sigpanic()         /usr/local/go/src/runtime/signal_unix.go:718 +0x2ef fp=0xc00016aab8 sp=0xc00016aa80 pc=0x45108f runtime.memmove(0xc00016ac40, 0x0, 0x8)         /usr/local/go/src/runtime/memmove_amd64.s:171 +0x12f fp=0xc00016aac0 sp=0xc00016aab8 pc=0x473b6f runtime.recvDirect(0xb5c5e0, 0xc000042cc0, 0xc00016ac40)         /usr/local/go/src/runtime/chan.go:352 +0x75 fp=0xc00016aaf8 sp=0xc00016aac0 pc=0x407ab5 runtime.recv(0xc0009dfc80, 0xc000042cc0, 0xc00016ac40, 0xc00016aba0, 0x3)         /usr/local/go/src/runtime/chan.go:614 +0x10e fp=0xc00016ab30 sp=0xc00016aaf8 pc=0x40886e runtime.chanrecv(0xc0009dfc80, 0xc00016ac40, 0x1, 0xc000160101)         /usr/local/go/src/runtime/chan.go:526 +0x5b5 fp=0xc00016abc0 sp=0xc00016ab30 pc=0x408535 runtime.chanrecv1(0xc0009dfc80, 0xc00016ac40)         /usr/local/go/src/runtime/chan.go:439 +0x2b fp=0xc00016abf0 sp=0xc00016abc0 pc=0x407f2b io.(pipe).Write(0xc00021f3e0, 0xc000cc0000, 0x68000, 0x68000, 0x18000, 0x0, 0x0)         /usr/local/go/src/io/pipe.go:96 +0x1f7 fp=0xc00016aca0 sp=0xc00016abf0 pc=0x4d7ef7 io.(PipeWriter).Write(0xc000010318, 0xc000ca8000, 0x80000, 0x80000, 0x2, 0xc000be0960, 0x0)         /usr/local/go/src/io/pipe.go:163 +0x4c fp=0xc00016ace8 sp=0xc00016aca0 pc=0x4d836c github.com/zrepl/zrepl/rpc/dataconn/stream.readStream(0xc00021f020, 0xc000682140, 0xd6eda0, 0xc000010318, 0x5, 0xc00021f320)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream.go:255 +0x2c2 fp=0xc00016af68 sp=0xc00016ace8 pc=0xa2c3a2 github.com/zrepl/zrepl/rpc/dataconn/stream.(Conn).ReadStream.func2(0xc000682180, 0xc000010318, 0xc000000005)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:154 +0x85 fp=0xc00016afc8 sp=0xc00016af68 pc=0xa2f485 runtime.goexit()         /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00016afd0 sp=0xc00016afc8 pc=0x472c61 created by github.com/zrepl/zrepl/rpc/dataconn/stream.(Conn).ReadStream         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:152 +0x2a5

goroutine 1 [select, 45 minutes]: github.com/zrepl/zrepl/daemon.Run(0xd7f0b8, 0xc000332bd0, 0xc00007e9e0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:118 +0xbbe github.com/zrepl/zrepl/daemon.glob..func1(0xd7f0b8, 0xc0001b9260, 0x1191720, 0xc00007e7c0, 0x0, 0x2, 0x2, 0x2)         /home/mdtancsa/clean/zrepl/daemon/main.go:16 +0x49 github.com/zrepl/zrepl/cli.(Subcommand).run(0x1191720, 0xc000188500, 0xc00007e7c0, 0x0, 0x2)         /home/mdtancsa/clean/zrepl/cli/cli.go:105 +0xe7 github.com/spf13/cobra.(Command).execute(0xc000188500, 0xc00007e7a0, 0x2, 0x2, 0xc000188500, 0xc00007e7a0)         /root/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760 +0x2c2 github.com/spf13/cobra.(Command).ExecuteC(0x1193380, 0x401700, 0x0, 0x0)         /root/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846 +0x2fe github.com/spf13/cobra.(Command).Execute(...)         /root/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794 github.com/zrepl/zrepl/cli.Run()         /home/mdtancsa/clean/zrepl/cli/cli.go:152 +0x2d main.main()         /home/mdtancsa/clean/zrepl/main.go:24 +0x25

goroutine 19 [select]: github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1()         /home/mdtancsa/clean/zrepl/daemon/logging/trace/trace_chrometrace.go:146 +0xd9 created by github.com/zrepl/zrepl/daemon/logging/trace.init.2         /home/mdtancsa/clean/zrepl/daemon/logging/trace/trace_chrometrace.go:135 +0xb2

goroutine 8 [syscall, 45 minutes]: os/signal.signal_recv(0x472c66)         /usr/local/go/src/runtime/sigqueue.go:168 +0xa5 os/signal.loop()         /usr/local/go/src/os/signal/signal_unix.go:23 +0x25 created by os/signal.Notify.func1.1         /usr/local/go/src/os/signal/signal.go:151 +0x45

goroutine 21 [chan receive, 45 minutes]: github.com/zrepl/zrepl/daemon.Run.func1(0xc0000436e0, 0xc0001d4960)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:38 +0x34 created by github.com/zrepl/zrepl/daemon.Run         /home/mdtancsa/clean/zrepl/daemon/daemon.go:37 +0x16b

goroutine 25 [select, 45 minutes]: github.com/zrepl/zrepl/daemon.(controlJob).Run(0xc0003305a0, 0xd7f0b8, 0xc000332e70)         /home/mdtancsa/clean/zrepl/daemon/control.go:172 +0x74b github.com/zrepl/zrepl/daemon.(jobs).start.func1(0xc00023b580, 0xd7f0b8, 0xc000332e70, 0xd82d18, 0xc0003305a0)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:255 +0x15b created by github.com/zrepl/zrepl/daemon.(*jobs).start         /home/mdtancsa/clean/zrepl/daemon/daemon.go:251 +0x425

goroutine 26 [IO wait]: internal/poll.runtime_pollWait(0x8288f4948, 0x72, 0x0)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000140018, 0x72, 0x0, 0x0, 0xc84d52)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Accept(0xc000140000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:401 +0x212 net.(netFD).accept(0xc000140000, 0x2a883c7200000004, 0x40be1, 0x6ad987197dbbb509)         /usr/local/go/src/net/fd_unix.go:172 +0x45 net.(TCPListener).accept(0xc0001301e0, 0x6016f767, 0xc000122d58, 0x4949e6)         /usr/local/go/src/net/tcpsock_posix.go:139 +0x32 net.(TCPListener).Accept(0xc0001301e0, 0xc000122da8, 0x18, 0xc0002a2900, 0x7ba29b)         /usr/local/go/src/net/tcpsock.go:261 +0x65 net/http.(Server).Serve(0xc000156000, 0xd7ca28, 0xc0001301e0, 0x0, 0x0)         /usr/local/go/src/net/http/server.go:2981 +0x285 net/http.Serve(...)         /usr/local/go/src/net/http/server.go:2529 github.com/zrepl/zrepl/daemon.(prometheusJob).Run(0xc00033a828, 0xd7f0b8, 0xc000332f60)         /home/mdtancsa/clean/zrepl/daemon/prometheus.go:82 +0x211 github.com/zrepl/zrepl/daemon.(jobs).start.func1(0xc00023b580, 0xd7f0b8, 0xc000332f60, 0xd82d60, 0xc00033a828)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:255 +0x15b created by github.com/zrepl/zrepl/daemon.(jobs).start         /home/mdtancsa/clean/zrepl/daemon/daemon.go:251 +0x425

goroutine 34 [select, 45 minutes]: github.com/zrepl/zrepl/rpc.(Server).Serve(0xc000033040, 0xd7f010, 0xc000033080, 0xd7c6f8, 0xc0001d4d50)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:117 +0x335 github.com/zrepl/zrepl/daemon/job.(PassiveSide).Run(0xc0002b1c20, 0xd7f0b8, 0xc000333050)         /home/mdtancsa/clean/zrepl/daemon/job/passive.go:190 +0x44a github.com/zrepl/zrepl/daemon.(jobs).start.func1(0xc00023b580, 0xd7f0b8, 0xc000333050, 0xd82df0, 0xc0002b1c20)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:255 +0x15b created by github.com/zrepl/zrepl/daemon.(jobs).start         /home/mdtancsa/clean/zrepl/daemon/daemon.go:251 +0x425

goroutine 10 [select, 45 minutes]: github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop(0xc00000e960, 0xd7f0b8, 0xc000332e70)         /home/mdtancsa/clean/zrepl/daemon/pprof.go:45 +0xaf created by github.com/zrepl/zrepl/daemon.NewPProfServer         /home/mdtancsa/clean/zrepl/daemon/pprof.go:36 +0xa5

goroutine 35 [select, 45 minutes]: github.com/zrepl/zrepl/rpc.(Server).Serve(0xc000132200, 0xd7f010, 0xc000132240, 0xd7c6f8, 0xc000110500)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:117 +0x335 github.com/zrepl/zrepl/daemon/job.(PassiveSide).Run(0xc0003320f0, 0xd7f0b8, 0xc000333140)         /home/mdtancsa/clean/zrepl/daemon/job/passive.go:190 +0x44a github.com/zrepl/zrepl/daemon.(jobs).start.func1(0xc00023b580, 0xd7f0b8, 0xc000333140, 0xd82df0, 0xc0003320f0)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:255 +0x15b created by github.com/zrepl/zrepl/daemon.(jobs).start         /home/mdtancsa/clean/zrepl/daemon/daemon.go:251 +0x425

goroutine 36 [select, 45 minutes]: github.com/zrepl/zrepl/rpc.(Server).Serve(0xc00040c0c0, 0xd7f010, 0xc00040c100, 0xd7c6f8, 0xc000408440)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:117 +0x335 github.com/zrepl/zrepl/daemon/job.(PassiveSide).Run(0xc000332570, 0xd7f0b8, 0xc000333230)         /home/mdtancsa/clean/zrepl/daemon/job/passive.go:190 +0x44a github.com/zrepl/zrepl/daemon.(jobs).start.func1(0xc00023b580, 0xd7f0b8, 0xc000333230, 0xd82df0, 0xc000332570)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:255 +0x15b created by github.com/zrepl/zrepl/daemon.(jobs).start         /home/mdtancsa/clean/zrepl/daemon/daemon.go:251 +0x425

goroutine 37 [semacquire, 45 minutes]: sync.runtime_Semacquire(0xc00023b588)         /usr/local/go/src/runtime/sema.go:56 +0x45 sync.(WaitGroup).Wait(0xc00023b580)         /usr/local/go/src/sync/waitgroup.go:130 +0x65 github.com/zrepl/zrepl/daemon.(jobs).wait.func1(0xc00023b580, 0xc000236d80)         /home/mdtancsa/clean/zrepl/daemon/daemon.go:151 +0x2d created by github.com/zrepl/zrepl/daemon.(*jobs).wait         /home/mdtancsa/clean/zrepl/daemon/daemon.go:150 +0x67

goroutine 60 [chan receive, 45 minutes]: github.com/zrepl/zrepl/daemon.(prometheusJob).Run.func1(0xd7f0b8, 0xc000332f60, 0xc0001301e0)         /home/mdtancsa/clean/zrepl/daemon/prometheus.go:75 +0x48 created by github.com/zrepl/zrepl/daemon.(prometheusJob).Run         /home/mdtancsa/clean/zrepl/daemon/prometheus.go:74 +0x165

goroutine 11 [IO wait, 45 minutes]: internal/poll.runtime_pollWait(0x8288f4b18, 0x72, 0x0)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0000e8918, 0x72, 0x0, 0x0, 0xc84d52)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Accept(0xc0000e8900, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:401 +0x212 net.(netFD).accept(0xc0000e8900, 0x30, 0x8015bf108, 0x30)         /usr/local/go/src/net/fd_unix.go:172 +0x45 net.(UnixListener).accept(0xc0001b96e0, 0xc00007be18, 0x40fcb8, 0x30)         /usr/local/go/src/net/unixsock_posix.go:162 +0x32 net.(UnixListener).Accept(0xc0001b96e0, 0xc1ab00, 0xc0001b98c0, 0xb84d20, 0x11887c0)         /usr/local/go/src/net/unixsock.go:260 +0x65 net/http.(Server).Serve(0xc0001e4000, 0xd7ca58, 0xc0001b96e0, 0x0, 0x0)         /usr/local/go/src/net/http/server.go:2981 +0x285 github.com/zrepl/zrepl/daemon.(controlJob).Run.func5(0xc00009c480, 0xc0001e4000, 0xc0001b96e0)         /home/mdtancsa/clean/zrepl/daemon/control.go:168 +0x45 created by github.com/zrepl/zrepl/daemon.(*controlJob).Run         /home/mdtancsa/clean/zrepl/daemon/control.go:167 +0x6be

goroutine 12 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc/transportmux.Demux.func1(0xd7f010, 0xc000033080, 0xd7d058, 0xc00000ea08, 0xc0001b9c20)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:116 +0x65 created by github.com/zrepl/zrepl/rpc/transportmux.Demux         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:115 +0x354

goroutine 69 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func1(0xc00012e100, 0xd7f010, 0xc000132280, 0xc000112b40, 0xd7c638, 0xc000110530)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:80 +0x74 created by github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:78 +0x135

goroutine 63 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc.NewServer.func1.2(0xd7f010, 0xc000033080, 0xc000102a80, 0xd873f0, 0xc00007ee40, 0xd873f0, 0xc00007eee0, 0xd873f0, 0xc00007ef80)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:63 +0x4c created by github.com/zrepl/zrepl/rpc.NewServer.func1         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:62 +0x1e5

goroutine 13 [IO wait]: internal/poll.runtime_pollWait(0x8288f4a30, 0x72, 0x0)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0000e8a98, 0x72, 0x0, 0x0, 0xc84d52)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Accept(0xc0000e8a80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:401 +0x212 net.(netFD).accept(0xc0000e8a80, 0x79e, 0x0, 0x0)         /usr/local/go/src/net/fd_unix.go:172 +0x45 net.(TCPListener).accept(0xc00000e9d8, 0x79e, 0xc0003a9ac0, 0x0)         /usr/local/go/src/net/tcpsock_posix.go:139 +0x32 net.(TCPListener).AcceptTCP(0xc00000e9d8, 0xc0006c8005, 0xc0006c8005, 0xc000a60460)         /usr/local/go/src/net/tcpsock.go:248 +0x65 github.com/zrepl/zrepl/tlsconf.(*ClientAuthListener).Accept(0xc00000e9f0, 0xed7a0f43b55fe39f, 0xc0003a9c78, 0x466f1a, 0xc00006ac00, 0x40bd5, 0xed7a0f43b55fe39f)         /home/mdtancsa/clean/zrepl/tlsconf/tlsconf.go:67 +0x46 github.com/zrepl/zrepl/transport/tls.tlsAuthListener.Accept(0xc00000e9f0, 0xc0003320c0, 0xd7f010, 0xc000033080, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/transport/tls/serve_tls.go:69 +0x72 github.com/zrepl/zrepl/rpc/versionhandshake.HandshakeListener.Accept(0xd7c6f8, 0xc0001d4d50, 0x2540be400, 0xd7f010, 0xc000033080, 0x0, 0xc000033000, 0x0)         /home/mdtancsa/clean/zrepl/rpc/versionhandshake/versionhandshake_transport_wrappers.go:50 +0x49 github.com/zrepl/zrepl/rpc/transportmux.Demux.func2(0xc0001b9c20, 0xd7f010, 0xc000033080, 0xd7d058, 0xc00000ea08, 0x2540be400, 0xc0001b9bf0)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:152 +0xce created by github.com/zrepl/zrepl/rpc/transportmux.Demux         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:137 +0x3c7

goroutine 14 [chan receive]: github.com/zrepl/zrepl/rpc/netadaptor.Listener.Accept(0xd7c638, 0xc0001d4dd0, 0xd873f0, 0xc00007eee0, 0xc000152120, 0xc000152180, 0xd85758, 0xc00099f1a0, 0x4449ee, 0xc000123d50)         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:79 +0x165 google.golang.org/grpc.(Server).Serve(0xc000102a80, 0xd7c608, 0xc000112960, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:556 +0x27f github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper.NewServer.func1(0xc000000048, 0xcc5908)         /home/mdtancsa/clean/zrepl/rpc/grpcclientidentity/grpchelper/authlistener_grpc_adaptor_wrapper.go:67 +0x79 github.com/zrepl/zrepl/rpc.NewServer.func1(0xd7f010, 0xc000033080, 0xd7c638, 0xc0001d4dd0, 0xc000043aa0)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:72 +0x1ef created by github.com/zrepl/zrepl/rpc.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:115 +0x24f

goroutine 15 [chan receive]: github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve(0xc0001b9b30, 0xd7f010, 0xc0000330c0, 0xd7c638, 0xc0001d4de0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:104 +0x25f github.com/zrepl/zrepl/rpc.NewServer.func4(0xd7f010, 0xc000033080, 0xd7c638, 0xc0001d4de0, 0xc000043aa0)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:86 +0x52 created by github.com/zrepl/zrepl/rpc.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:116 +0x2a5

goroutine 16 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func1(0xc00003bc10, 0xd7f010, 0xc0000330c0, 0xc0001b9b30, 0xd7c638, 0xc0001d4de0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:80 +0x74 created by github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:78 +0x135

goroutine 82 [select]: github.com/zrepl/zrepl/rpc/transportmux.(demuxListener).Accept(0xc0001d4de0, 0xd7f010, 0xc0000330c0, 0xc000a60480, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:51 +0xd5 github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func2(0xc00003bc10, 0xc00009c540, 0xd7c638, 0xc0001d4de0, 0xd7f010, 0xc0000330c0, 0xc0001b9b30)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:92 +0xac created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:88 +0x1d9

goroutine 64 [select]: github.com/zrepl/zrepl/rpc/transportmux.(*demuxListener).Accept(0xc0001d4dd0, 0xd7f048, 0xc00003a100, 0x0, 0x2, 0x1)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:51 +0xd5 github.com/zrepl/zrepl/rpc/netadaptor.Listener.handleAccept(0xd7c638, 0xc0001d4dd0, 0xd873f0, 0xc00007eee0, 0xc000152120, 0xc000152180)         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:94 +0x9f created by github.com/zrepl/zrepl/rpc/netadaptor.New         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:64 +0x153

goroutine 114 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc/transportmux.Demux.func1(0xd7f010, 0xc000132240, 0xd7d058, 0xc000130420, 0xc000112c30)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:116 +0x65 created by github.com/zrepl/zrepl/rpc/transportmux.Demux         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:115 +0x354

goroutine 115 [IO wait]: internal/poll.runtime_pollWait(0x8288f4860, 0x72, 0x0)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000140098, 0x72, 0x0, 0x0, 0xc84d52)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Accept(0xc000140080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:401 +0x212 net.(netFD).accept(0xc000140080, 0x79e, 0x0, 0x0)         /usr/local/go/src/net/fd_unix.go:172 +0x45 net.(TCPListener).accept(0xc0001303f0, 0x79e, 0xc0010dbac0, 0x0)         /usr/local/go/src/net/tcpsock_posix.go:139 +0x32 net.(TCPListener).AcceptTCP(0xc0001303f0, 0xc000818005, 0xc000818005, 0xc000536380)         /usr/local/go/src/net/tcpsock.go:248 +0x65 github.com/zrepl/zrepl/tlsconf.(*ClientAuthListener).Accept(0xc000130408, 0x567279de41a0c53f, 0xc0010dbc78, 0x466f1a, 0xc0002ac400, 0x40be9, 0x567279de41a0c53f)         /home/mdtancsa/clean/zrepl/tlsconf/tlsconf.go:67 +0x46 github.com/zrepl/zrepl/transport/tls.tlsAuthListener.Accept(0xc000130408, 0xc000332540, 0xd7f010, 0xc000132240, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/transport/tls/serve_tls.go:69 +0x72 github.com/zrepl/zrepl/rpc/versionhandshake.HandshakeListener.Accept(0xd7c6f8, 0xc000110500, 0x2540be400, 0xd7f010, 0xc000132240, 0x0, 0xc000132200, 0x0)         /home/mdtancsa/clean/zrepl/rpc/versionhandshake/versionhandshake_transport_wrappers.go:50 +0x49 github.com/zrepl/zrepl/rpc/transportmux.Demux.func2(0xc000112c30, 0xd7f010, 0xc000132240, 0xd7d058, 0xc000130420, 0x2540be400, 0xc000112c00)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:152 +0xce created by github.com/zrepl/zrepl/rpc/transportmux.Demux         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:137 +0x3c7

goroutine 116 [chan receive]: github.com/zrepl/zrepl/rpc/netadaptor.Listener.Accept(0xd7c638, 0xc000110520, 0xd873f0, 0xc000126280, 0xc0001522a0, 0xc000152300, 0xd85758, 0xc0003820c0, 0x4449ee, 0xc000125d50)         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:79 +0x165 google.golang.org/grpc.(Server).Serve(0xc000103680, 0xd7c608, 0xc000112de0, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:556 +0x27f github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper.NewServer.func1(0xc000000048, 0xcc5908)         /home/mdtancsa/clean/zrepl/rpc/grpcclientidentity/grpchelper/authlistener_grpc_adaptor_wrapper.go:67 +0x79 github.com/zrepl/zrepl/rpc.NewServer.func1(0xd7f010, 0xc000132240, 0xd7c638, 0xc000110520, 0xc000114840)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:72 +0x1ef created by github.com/zrepl/zrepl/rpc.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:115 +0x24f

goroutine 117 [chan receive]: github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve(0xc000112b40, 0xd7f010, 0xc000132280, 0xd7c638, 0xc000110530)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:104 +0x25f github.com/zrepl/zrepl/rpc.NewServer.func4(0xd7f010, 0xc000132240, 0xd7c638, 0xc000110530, 0xc000114840)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:86 +0x52 created by github.com/zrepl/zrepl/rpc.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:116 +0x2a5

goroutine 118 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc.NewServer.func1.2(0xd7f010, 0xc000132240, 0xc000103680, 0xd873f0, 0xc0001261e0, 0xd873f0, 0xc000126280, 0xd873f0, 0xc000126320)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:63 +0x4c created by github.com/zrepl/zrepl/rpc.NewServer.func1         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:62 +0x1e5

goroutine 119 [select]: github.com/zrepl/zrepl/rpc/transportmux.(*demuxListener).Accept(0xc000110520, 0xd7f048, 0xc00003a100, 0x0, 0x2, 0x1)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:51 +0xd5 github.com/zrepl/zrepl/rpc/netadaptor.Listener.handleAccept(0xd7c638, 0xc000110520, 0xd873f0, 0xc000126280, 0xc0001522a0, 0xc000152300)         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:94 +0x9f created by github.com/zrepl/zrepl/rpc/netadaptor.New         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:64 +0x153

goroutine 70 [select]: github.com/zrepl/zrepl/rpc/transportmux.(demuxListener).Accept(0xc000110530, 0xd7f010, 0xc000132280, 0xc0005363a0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:51 +0xd5 github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func2(0xc00012e100, 0xc0003b2060, 0xd7c638, 0xc000110530, 0xd7f010, 0xc000132280, 0xc000112b40)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:92 +0xac created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:88 +0x1d9

goroutine 101 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc/transportmux.Demux.func1(0xd7f010, 0xc00040c100, 0xd7d058, 0xc00040a0d8, 0xc000402ab0)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:116 +0x65 created by github.com/zrepl/zrepl/rpc/transportmux.Demux         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:115 +0x354

goroutine 102 [IO wait, 2 minutes]: internal/poll.runtime_pollWait(0x8288f4778, 0x72, 0x0)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc00041c098, 0x72, 0x0, 0x0, 0xc84d52)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Accept(0xc00041c080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:401 +0x212 net.(netFD).accept(0xc00041c080, 0x793, 0x0, 0x0)         /usr/local/go/src/net/fd_unix.go:172 +0x45 net.(TCPListener).accept(0xc00040a0a8, 0x793, 0xc000549ac0, 0x0)         /usr/local/go/src/net/tcpsock_posix.go:139 +0x32 net.(TCPListener).AcceptTCP(0xc00040a0a8, 0xc0006ae805, 0xc0006ae805, 0xc00051d2c0)         /usr/local/go/src/net/tcpsock.go:248 +0x65 github.com/zrepl/zrepl/tlsconf.(*ClientAuthListener).Accept(0xc00040a0c0, 0x47c345ac63751d35, 0xc000549c78, 0x466f1a, 0xc00006ac00, 0x40b67, 0x47c345ac63751d35)         /home/mdtancsa/clean/zrepl/tlsconf/tlsconf.go:67 +0x46 github.com/zrepl/zrepl/transport/tls.tlsAuthListener.Accept(0xc00040a0c0, 0xc0003329c0, 0xd7f010, 0xc00040c100, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/transport/tls/serve_tls.go:69 +0x72 github.com/zrepl/zrepl/rpc/versionhandshake.HandshakeListener.Accept(0xd7c6f8, 0xc000408440, 0x2540be400, 0xd7f010, 0xc00040c100, 0x0, 0xc00040c100, 0x0)         /home/mdtancsa/clean/zrepl/rpc/versionhandshake/versionhandshake_transport_wrappers.go:50 +0x49 github.com/zrepl/zrepl/rpc/transportmux.Demux.func2(0xc000402ab0, 0xd7f010, 0xc00040c100, 0xd7d058, 0xc00040a0d8, 0x2540be400, 0xc000402a80)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:152 +0xce created by github.com/zrepl/zrepl/rpc/transportmux.Demux         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:137 +0x3c7

goroutine 103 [chan receive, 2 minutes]: github.com/zrepl/zrepl/rpc/netadaptor.Listener.Accept(0xd7c638, 0xc000408460, 0xd873f0, 0xc0004004a0, 0xc0004361e0, 0xc000436240, 0xd85758, 0xc00057a1e0, 0x4449ee, 0xc0001f0d50)         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:79 +0x165 google.golang.org/grpc.(Server).Serve(0xc000432a80, 0xd7c608, 0xc000402c90, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:556 +0x27f github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper.NewServer.func1(0xc000000048, 0xcc5908)         /home/mdtancsa/clean/zrepl/rpc/grpcclientidentity/grpchelper/authlistener_grpc_adaptor_wrapper.go:67 +0x79 github.com/zrepl/zrepl/rpc.NewServer.func1(0xd7f010, 0xc00040c100, 0xd7c638, 0xc000408460, 0xc000406180)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:72 +0x1ef created by github.com/zrepl/zrepl/rpc.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:115 +0x24f

goroutine 104 [chan receive, 2 minutes]: github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve(0xc0004029c0, 0xd7f010, 0xc00040c140, 0xd7c638, 0xc000408470)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:104 +0x25f github.com/zrepl/zrepl/rpc.NewServer.func4(0xd7f010, 0xc00040c100, 0xd7c638, 0xc000408470, 0xc000406180)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:86 +0x52 created by github.com/zrepl/zrepl/rpc.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:116 +0x2a5

goroutine 105 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func1(0xc000412150, 0xd7f010, 0xc00040c140, 0xc0004029c0, 0xd7c638, 0xc000408470)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:80 +0x74 created by github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:78 +0x135

goroutine 106 [select, 2 minutes]: github.com/zrepl/zrepl/rpc/transportmux.(demuxListener).Accept(0xc000408470, 0xd7f010, 0xc00040c140, 0xc00051d2e0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:51 +0xd5 github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func2(0xc000412150, 0xc000436060, 0xd7c638, 0xc000408470, 0xd7f010, 0xc00040c140, 0xc0004029c0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:92 +0xac created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:88 +0x1d9

goroutine 107 [chan receive, 45 minutes]: github.com/zrepl/zrepl/rpc.NewServer.func1.2(0xd7f010, 0xc00040c100, 0xc000432a80, 0xd873f0, 0xc000400400, 0xd873f0, 0xc0004004a0, 0xd873f0, 0xc000400540)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:63 +0x4c created by github.com/zrepl/zrepl/rpc.NewServer.func1         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:62 +0x1e5

goroutine 108 [select, 2 minutes]: github.com/zrepl/zrepl/rpc/transportmux.(*demuxListener).Accept(0xc000408460, 0xd7f048, 0xc00003a100, 0x0, 0x2, 0x1)         /home/mdtancsa/clean/zrepl/rpc/transportmux/transportmux.go:51 +0xd5 github.com/zrepl/zrepl/rpc/netadaptor.Listener.handleAccept(0xd7c638, 0xc000408460, 0xd873f0, 0xc0004004a0, 0xc0004361e0, 0xc000436240)         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:94 +0x9f created by github.com/zrepl/zrepl/rpc/netadaptor.New         /home/mdtancsa/clean/zrepl/rpc/netadaptor/authlistener_netlistener_adaptor.go:64 +0x153

goroutine 309084 [select]: github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats(0xc000133340)         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:84 +0x195 created by github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.Wrap         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:63 +0x1c7

goroutine 308007 [IO wait]: internal/poll.runtime_pollWait(0x8288f3d80, 0x72, 0xffffffffffffffff)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc00046c018, 0x72, 0x700, 0x7a3, 0xffffffffffffffff)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc00046c000, 0xc0006c8000, 0x7a3, 0x7a3, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5 net.(netFD).Read(0xc00046c000, 0xc0006c8000, 0x7a3, 0x7a3, 0x79e, 0xc0006c8000, 0x5)         /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00012a000, 0xc0006c8000, 0x7a3, 0x7a3, 0x0, 0x0, 0x0)         /usr/local/go/src/net/net.go:183 +0x91 crypto/tls.(atLeastReader).Read(0xc000d2c450, 0xc0006c8000, 0x7a3, 0x7a3, 0x79e, 0xc000100c00, 0x0)         /usr/local/go/src/crypto/tls/conn.go:776 +0x63 bytes.(Buffer).ReadFrom(0xc0001365f8, 0xd6dd80, 0xc000d2c450, 0x40d345, 0xba63a0, 0xc58560)         /usr/local/go/src/bytes/buffer.go:204 +0xbe crypto/tls.(Conn).readFromUntil(0xc000136380, 0x82940aaa8, 0xc00012a000, 0x5, 0xc00012a000, 0x1e)         /usr/local/go/src/crypto/tls/conn.go:798 +0xf3 crypto/tls.(Conn).readRecordOrCCS(0xc000136380, 0x0, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:605 +0x115 crypto/tls.(Conn).readRecord(...)         /usr/local/go/src/crypto/tls/conn.go:573 crypto/tls.(Conn).Read(0xc000136380, 0xc000624000, 0x8000, 0x8000, 0x0, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:1276 +0x165 bufio.(Reader).Read(0xc0006229c0, 0xc0003ce038, 0x9, 0x9, 0x108, 0x7ffffffff020, 0xc000167d58)         /usr/local/go/src/bufio/bufio.go:227 +0x222 io.ReadAtLeast(0xd6dbe0, 0xc0006229c0, 0xc0003ce038, 0x9, 0x9, 0x9, 0xf1303bd66db6, 0xc008ac2a2c, 0x8ac2a2c008ce5d2)         /usr/local/go/src/io/io.go:328 +0x87 io.ReadFull(...)         /usr/local/go/src/io/io.go:347 golang.org/x/net/http2.readFrameHeader(0xc0003ce038, 0x9, 0x9, 0xd6dbe0, 0xc0006229c0, 0x0, 0x0, 0xc0008f5080, 0x1)         /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:237 +0x89 golang.org/x/net/http2.(Framer).ReadFrame(0xc0003ce000, 0xc00015a240, 0xc00015a240, 0x0, 0x0)         /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:492 +0xa5 google.golang.org/grpc/internal/transport.(http2Server).HandleStreams(0xc0008f5080, 0xc001284240, 0xcc5c30)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:429 +0x85 google.golang.org/grpc.(Server).serveStreams(0xc000103680, 0xd86578, 0xc0008f5080)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:681 +0xe2 google.golang.org/grpc.(Server).handleRawConn.func1(0xc000103680, 0xd86578, 0xc0008f5080)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:643 +0x45 created by google.golang.org/grpc.(*Server).handleRawConn         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:642 +0x60d

goroutine 309186 [chan receive]: github.com/zrepl/zrepl/rpc/dataconn/stream.readStream(0xc00092f2c0, 0xc000133340, 0xd6eda0, 0xc002a447a8, 0x5, 0xc00092f440)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream.go:245 +0xa5 github.com/zrepl/zrepl/rpc/dataconn/stream.(Conn).ReadStream.func2(0xc000133380, 0xc002a447a8, 0xc000000005)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:154 +0x85 created by github.com/zrepl/zrepl/rpc/dataconn/stream.(Conn).ReadStream         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:152 +0x2a5

goroutine 308006 [select]: google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc0008f5080)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:921 +0x1f0 created by google.golang.org/grpc/internal/transport.newHTTP2Server         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:282 +0x1085

goroutine 298390 [IO wait]: internal/poll.runtime_pollWait(0x82a140d10, 0x72, 0xffffffffffffffff)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000214398, 0x72, 0x700, 0x7a3, 0xffffffffffffffff)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000214380, 0xc0006c8800, 0x7a3, 0x7a3, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5 net.(netFD).Read(0xc000214380, 0xc0006c8800, 0x7a3, 0x7a3, 0x79e, 0xc0006c8800, 0x5)         /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc002a440e0, 0xc0006c8800, 0x7a3, 0x7a3, 0x0, 0x0, 0x0)         /usr/local/go/src/net/net.go:183 +0x91 crypto/tls.(atLeastReader).Read(0xc000a688a0, 0xc0006c8800, 0x7a3, 0x7a3, 0x79e, 0xc000780000, 0x0)         /usr/local/go/src/crypto/tls/conn.go:776 +0x63 bytes.(Buffer).ReadFrom(0xc000136978, 0xd6dd80, 0xc000a688a0, 0x40d345, 0xba63a0, 0xc58560)         /usr/local/go/src/bytes/buffer.go:204 +0xbe crypto/tls.(Conn).readFromUntil(0xc000136700, 0x82940aaa8, 0xc002a440e0, 0x5, 0xc002a440e0, 0x11)         /usr/local/go/src/crypto/tls/conn.go:798 +0xf3 crypto/tls.(Conn).readRecordOrCCS(0xc000136700, 0x0, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:605 +0x115 crypto/tls.(Conn).readRecord(...)         /usr/local/go/src/crypto/tls/conn.go:573 crypto/tls.(Conn).Read(0xc000136700, 0xc0005c8000, 0x8000, 0x8000, 0x0, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:1276 +0x165 bufio.(Reader).Read(0xc000662840, 0xc0001561f8, 0x9, 0x9, 0x0, 0x7ffffffff020, 0xc0001f4d58)         /usr/local/go/src/bufio/bufio.go:227 +0x222 io.ReadAtLeast(0xd6dbe0, 0xc000662840, 0xc0001561f8, 0x9, 0x9, 0x9, 0xf12f3bde891a, 0xc032ba378f, 0x32ba378f008ce668)         /usr/local/go/src/io/io.go:328 +0x87 io.ReadFull(...)         /usr/local/go/src/io/io.go:347 golang.org/x/net/http2.readFrameHeader(0xc0001561f8, 0x9, 0x9, 0xd6dbe0, 0xc000662840, 0x0, 0x0, 0xc000418000, 0x29f44ca1115)         /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:237 +0x89 golang.org/x/net/http2.(Framer).ReadFrame(0xc0001561c0, 0xc00003c618, 0xc00003c618, 0x0, 0x0)         /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:492 +0xa5 google.golang.org/grpc/internal/transport.(http2Server).HandleStreams(0xc000418000, 0xc000607260, 0xcc5c30)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:429 +0x85 google.golang.org/grpc.(Server).serveStreams(0xc000102a80, 0xd86578, 0xc000418000)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:681 +0xe2 google.golang.org/grpc.(Server).handleRawConn.func1(0xc000102a80, 0xd86578, 0xc000418000)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:643 +0x45 created by google.golang.org/grpc.(*Server).handleRawConn         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:642 +0x60d

goroutine 307988 [syscall]: syscall.Syscall6(0x214, 0x0, 0x10db, 0x0, 0x18, 0x0, 0x0, 0xc000209e80, 0xd6fbc0, 0x11cda40)         /usr/local/go/src/syscall/asm_unix_amd64.s:39 +0x5 os.(Process).blockUntilWaitable(0xc0003aa0c0, 0xd, 0xbffddbb6ec673f17, 0x29caed06a9d)         /usr/local/go/src/os/wait_wait6.go:30 +0x5e os.(Process).wait(0xc0003aa0c0, 0xc0004a0680, 0xc8a712, 0xd)         /usr/local/go/src/os/exec_unix.go:22 +0x39 os.(Process).Wait(...)         /usr/local/go/src/os/exec.go:129 os/exec.(Cmd).Wait(0xc0000ca160, 0xc000e16fb8, 0xc0009d1530)         /usr/local/go/src/os/exec/exec.go:507 +0x65 github.com/zrepl/zrepl/zfs/zfscmd.(*Cmd).Wait(0xc00047a000, 0xc00074f790, 0xc000e16fb8)         /home/mdtancsa/clean/zrepl/zfs/zfscmd/zfscmd.go:112 +0x3c github.com/zrepl/zrepl/zfs.ZFSRecv.func4(0xc0004a70e0, 0xc00047a000, 0xc0006861d0, 0xd7f010, 0xc00088a040, 0xc00028d4a0)         /home/mdtancsa/clean/zrepl/zfs/zfs.go:1127 +0x74 created by github.com/zrepl/zrepl/zfs.ZFSRecv         /home/mdtancsa/clean/zrepl/zfs/zfs.go:1125 +0x9c5

goroutine 307837 [select]: github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats(0xc000682140)         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:84 +0x195 created by github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.Wrap         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:63 +0x1c7

goroutine 306912 [IO wait]: internal/poll.runtime_pollWait(0x8288f45a8, 0x72, 0xffffffffffffffff)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000622858, 0x72, 0x8001, 0x8000, 0xffffffffffffffff)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000622840, 0xc00069a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5 os.(File).read(...)         /usr/local/go/src/os/file_posix.go:31 os.(File).Read(0xc0002ec198, 0xc00069a000, 0x8000, 0x8000, 0xc0007cee80, 0x46ebc5, 0xc000e17528)         /usr/local/go/src/os/file.go:117 +0x77 bytes.(Buffer).ReadFrom(0xc00028d4a0, 0xd6f300, 0xc0002ec198, 0x8294155d8, 0xc00028d4a0, 0xc0007cef01)         /usr/local/go/src/bytes/buffer.go:204 +0xbe io.copyBuffer(0xd6dc40, 0xc00028d4a0, 0xd6f300, 0xc0002ec198, 0x0, 0x0, 0x0, 0x407175, 0xc000e175c0, 0xc0007cefb0)         /usr/local/go/src/io/io.go:409 +0x357 io.Copy(...)         /usr/local/go/src/io/io.go:382 os/exec.(Cmd).writerDescriptor.func1(0xc000e175c0, 0xc0007cefb0)         /usr/local/go/src/os/exec/exec.go:311 +0x65 os/exec.(Cmd).Start.func1(0xc0000ca160, 0xc0004a0400)         /usr/local/go/src/os/exec/exec.go:441 +0x27 created by os/exec.(*Cmd).Start         /usr/local/go/src/os/exec/exec.go:440 +0x630

goroutine 307838 [IO wait]: internal/poll.runtime_pollWait(0x82a140970, 0x72, 0xffffffffffffffff)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0000e8098, 0x72, 0x7d00, 0x7dca, 0xffffffffffffffff)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0000e8080, 0xc0006141f0, 0x7dca, 0x7dca, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5 net.(netFD).Read(0xc0000e8080, 0xc0006141f0, 0x7dca, 0x7dca, 0x5a8, 0x0, 0x0)         /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc0000101c0, 0xc0006141f0, 0x7dca, 0x7dca, 0x0, 0x0, 0x0)         /usr/local/go/src/net/net.go:183 +0x91 crypto/tls.(atLeastReader).Read(0xc00090b530, 0xc0006141f0, 0x7dca, 0x7dca, 0x5a8, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:776 +0x63 bytes.(Buffer).ReadFrom(0xc0002285f8, 0xd6dd80, 0xc00090b530, 0x40d345, 0xba63a0, 0xc58560)         /usr/local/go/src/bytes/buffer.go:204 +0xbe crypto/tls.(Conn).readFromUntil(0xc000228380, 0x82940aaa8, 0xc0000101c0, 0x4016, 0xc0000101c0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:798 +0xf3 crypto/tls.(Conn).readRecordOrCCS(0xc000228380, 0x0, 0x0, 0x2a382ce7420)         /usr/local/go/src/crypto/tls/conn.go:650 +0x771 crypto/tls.(Conn).readRecord(...)         /usr/local/go/src/crypto/tls/conn.go:573 crypto/tls.(Conn).Read(0xc000228380, 0xc00175e000, 0x7c000, 0x7c000, 0x0, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:1276 +0x165 github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).Read(0xc000682100, 0xc00175e000, 0x7c000, 0x7c000, 0x4000, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn.go:118 +0xb1 github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).readvFallback(0xc000682100, 0xc000591b80, 0x2, 0x2, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn.go:202 +0xeb github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).readv(...)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn_readv_unsupported.go:8 github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).ReadvFull(0xc000682100, 0xc000591b80, 0x2, 0x2, 0x80000, 0x80000, 0xc000be0960)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn.go:188 +0x49 github.com/zrepl/zrepl/rpc/dataconn/frameconn.(Conn).readFrame(0xc0004681e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/frameconn/frameconn.go:172 +0x236 github.com/zrepl/zrepl/rpc/dataconn/frameconn.(Conn).ReadFrame(0xc0004681e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/frameconn/frameconn.go:108 +0xff github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(Conn).readFrameFiltered(0xc000682140, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:115 +0x97 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(Conn).ReadFrame(...)         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:110 github.com/zrepl/zrepl/rpc/dataconn/stream.readFrames(0xc00021f020, 0xc0009df740, 0xc000682140)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream.go:227 +0xe5 github.com/zrepl/zrepl/rpc/dataconn/stream.(Conn).readFrames(0xc000682180)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:64 +0x45 created by github.com/zrepl/zrepl/rpc/dataconn/stream.Wrap         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:51 +0x10c

goroutine 308005 [select]: google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc00023a140, 0x1, 0x0, 0x0, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317 +0xf4 google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc000ece7e0, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435 +0x1dd google.golang.org/grpc/internal/transport.newHTTP2Server.func2(0xc0008f5080)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:276 +0xd7 created by google.golang.org/grpc/internal/transport.newHTTP2Server         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:273 +0x105c

goroutine 308739 [runnable]: os/exec.(Cmd).Start.func1(0xc000c6af20, 0xc001153440)         /usr/local/go/src/os/exec/exec.go:440 created by os/exec.(Cmd).Start         /usr/local/go/src/os/exec/exec.go:440 +0x630

goroutine 298389 [select]: google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc000418000)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:921 +0x1f0 created by google.golang.org/grpc/internal/transport.newHTTP2Server         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:282 +0x1085

goroutine 308740 [runnable]: os/exec.(Cmd).Start.func2(0xc000c6af20)         /usr/local/go/src/os/exec/exec.go:448 created by os/exec.(Cmd).Start         /usr/local/go/src/os/exec/exec.go:448 +0x6cb

goroutine 309082 [chan receive]: os/exec.(Cmd).Wait(0xc000c6af20, 0x0, 0x0)         /usr/local/go/src/os/exec/exec.go:515 +0x145 os/exec.(Cmd).Run(0xc000c6af20, 0xc000b6c870, 0x119ee20)         /usr/local/go/src/os/exec/exec.go:341 +0x5f os/exec.(Cmd).CombinedOutput(0xc000c6af20, 0x0, 0x0, 0x9521e5, 0xc33ee0, 0xc00046de00)         /usr/local/go/src/os/exec/exec.go:567 +0x91 github.com/zrepl/zrepl/zfs/zfscmd.(Cmd).CombinedOutput(0xc00046de00, 0xc0003d27e0, 0xc8127c, 0x3, 0xc000441668, 0x3)         /home/mdtancsa/clean/zrepl/zfs/zfscmd/zfscmd.go:39 +0x69 github.com/zrepl/zrepl/zfs.ZFSRecvClearResumeToken(0xd7f0b8, 0xc0003d27e0, 0xc000e8bf00, 0x38, 0x1, 0xc000e8bf00)         /home/mdtancsa/clean/zrepl/zfs/zfs.go:1243 +0x125 github.com/zrepl/zrepl/endpoint.(Receiver).Receive(0xc0003807e0, 0xd7f0b8, 0xc0003d27e0, 0xc00009b4a0, 0xd761d0, 0xc00090afa8, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/endpoint/endpoint.go:739 +0x2197 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConnRequest(0xc000112b40, 0xd7f0b8, 0xc0001c7d70, 0xc000d40c98, 0x8, 0xc000133380)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:186 +0xc93 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConn.func2(0xd7f0b8, 0xc0001c7d70)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:150 +0x59 github.com/zrepl/zrepl/daemon/job.(PassiveSide).Run.func1(0xd7f0b8, 0xc0001c7bf0, 0xd773f0, 0xc000bc7800, 0xc0001c7b60)         /home/mdtancsa/clean/zrepl/daemon/job/passive.go:178 +0x2c6 github.com/zrepl/zrepl/rpc.NewServer.func3(0xd7f0b8, 0xc0003d2480, 0xd77418, 0xc000bc77e0, 0xc0001c7b60)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:82 +0xd5 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConn(0xc000112b40, 0xc0005363a0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:149 +0x3bf github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func3(0xc00012e100, 0xc000112b40, 0xc0005363a0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:108 +0x5d created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:106 +0x247

goroutine 307987 [chan send]: io.(pipe).Read(0xc00021f3e0, 0xc0006a8000, 0x8000, 0x8000, 0x8000, 0x0, 0x0)         /usr/local/go/src/io/pipe.go:60 +0x139 io.(PipeReader).Read(0xc000010300, 0xc0006a8000, 0x8000, 0x8000, 0x0, 0x0, 0x0)         /usr/local/go/src/io/pipe.go:134 +0x4c github.com/zrepl/zrepl/util/chainedio.(ChainedReadCloser).Read(0xc0004a0080, 0xc0006a8000, 0x8000, 0x8000, 0x8000, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/util/chainedio/chainedio_readcloser.go:22 +0xa5 io.copyBuffer(0xd70580, 0xc0002080f0, 0x829545770, 0xc0004a0080, 0xc0006a8000, 0x8000, 0x8000, 0xba64a0, 0xc6cb00, 0x8295457b0)         /usr/local/go/src/io/io.go:423 +0x12c io.Copy(...)         /usr/local/go/src/io/io.go:382 os.genericReadFrom(0xc0002ec018, 0x829545770, 0xc0004a0080, 0xc6cb00, 0xaac101, 0x8295457b0)         /usr/local/go/src/os/file.go:160 +0x99 os.(File).ReadFrom(0xc0002ec018, 0x829545770, 0xc0004a0080, 0x8295457b0, 0xc0002ec018, 0xc0003c1701)         /usr/local/go/src/os/file.go:154 +0x6e io.copyBuffer(0xd6f320, 0xc0002ec018, 0x829545770, 0xc0004a0080, 0x0, 0x0, 0x0, 0x472c61, 0xc000622960, 0xd6fbc0)         /usr/local/go/src/io/io.go:409 +0x357 io.Copy(...)         /usr/local/go/src/io/io.go:382 github.com/zrepl/zrepl/zfs.ZFSRecv.func3(0xc0002ec018, 0xd761f8, 0xc0004a0080, 0xc000153c80)         /home/mdtancsa/clean/zrepl/zfs/zfs.go:1120 +0x8c created by github.com/zrepl/zrepl/zfs.ZFSRecv         /home/mdtancsa/clean/zrepl/zfs/zfs.go:1119 +0x945

goroutine 306911 [IO wait]: internal/poll.runtime_pollWait(0x82a141198, 0x72, 0xffffffffffffffff)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0006225b8, 0x72, 0x401, 0x400, 0xffffffffffffffff)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0006225a0, 0xc0002a8000, 0x400, 0x400, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5 os.(File).read(...)         /usr/local/go/src/os/file_posix.go:31 os.(File).Read(0xc0002ec138, 0xc0002a8000, 0x400, 0x400, 0xaaaa37, 0xc00020c4a0, 0xc0004a0360)         /usr/local/go/src/os/file.go:117 +0x77 bytes.(Buffer).ReadFrom(0xc00028d470, 0xd6f300, 0xc0002ec138, 0x8294155d8, 0xc00028d470, 0xbffddbb6ec2a3d01)         /usr/local/go/src/bytes/buffer.go:204 +0xbe io.copyBuffer(0xd6dc40, 0xc00028d470, 0xd6f300, 0xc0002ec138, 0x0, 0x0, 0x0, 0xc0006467a8, 0xc0006467a0, 0x0)         /usr/local/go/src/io/io.go:409 +0x357 io.Copy(...)         /usr/local/go/src/io/io.go:382 os/exec.(Cmd).writerDescriptor.func1(0x11cda40, 0x0)         /usr/local/go/src/os/exec/exec.go:311 +0x65 os/exec.(Cmd).Start.func1(0xc0000ca160, 0xc0004a0380)         /usr/local/go/src/os/exec/exec.go:441 +0x27 created by os/exec.(*Cmd).Start         /usr/local/go/src/os/exec/exec.go:440 +0x630

goroutine 298388 [select]: google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc000a98540, 0x1, 0x0, 0x0, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317 +0xf4 google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc000ecf260, 0x0, 0x0)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435 +0x1dd google.golang.org/grpc/internal/transport.newHTTP2Server.func2(0xc000418000)         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:276 +0xd7 created by google.golang.org/grpc/internal/transport.newHTTP2Server         /root/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:273 +0x105c

goroutine 309085 [IO wait]: internal/poll.runtime_pollWait(0x82a1410b0, 0x72, 0xffffffffffffffff)         /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0000e9198, 0x72, 0x2000, 0x2039, 0xffffffffffffffff)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...)         /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0000e9180, 0xc0008b1b68, 0x2039, 0x2039, 0x0, 0x0, 0x0)         /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5 net.(netFD).Read(0xc0000e9180, 0xc0008b1b68, 0x2039, 0x2039, 0x5a8, 0x0, 0x0)         /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc002a44768, 0xc0008b1b68, 0x2039, 0x2039, 0x0, 0x0, 0x0)         /usr/local/go/src/net/net.go:183 +0x91 crypto/tls.(atLeastReader).Read(0xc00090b458, 0xc0008b1b68, 0x2039, 0x2039, 0x5a8, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:776 +0x63 bytes.(Buffer).ReadFrom(0xc00055ecf8, 0xd6dd80, 0xc00090b458, 0x40d345, 0xba63a0, 0xc58560)         /usr/local/go/src/bytes/buffer.go:204 +0xbe crypto/tls.(Conn).readFromUntil(0xc00055ea80, 0x82940aaa8, 0xc002a44768, 0x4016, 0xc002a44768, 0x0)         /usr/local/go/src/crypto/tls/conn.go:798 +0xf3 crypto/tls.(Conn).readRecordOrCCS(0xc00055ea80, 0x0, 0x0, 0x2a3828a9ea3)         /usr/local/go/src/crypto/tls/conn.go:650 +0x771 crypto/tls.(Conn).readRecord(...)         /usr/local/go/src/crypto/tls/conn.go:573 crypto/tls.(Conn).Read(0xc00055ea80, 0xc0014ff51c, 0x5aae4, 0x5aae4, 0x0, 0x0, 0x0)         /usr/local/go/src/crypto/tls/conn.go:1276 +0x165 github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).Read(0xc000133300, 0xc0014ff51c, 0x5aae4, 0x5aae4, 0x4000, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn.go:118 +0xb1 github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).readvFallback(0xc000133300, 0xc0000cdb80, 0x2, 0x2, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn.go:202 +0xeb github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).readv(...)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn_readv_unsupported.go:8 github.com/zrepl/zrepl/rpc/dataconn/timeoutconn.(Conn).ReadvFull(0xc000133300, 0xc0000cdb80, 0x2, 0x2, 0x80000, 0x80000, 0xc0003d24b0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/timeoutconn/timeoutconn.go:188 +0x49 github.com/zrepl/zrepl/rpc/dataconn/frameconn.(Conn).readFrame(0xc00009b450, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/frameconn/frameconn.go:172 +0x236 github.com/zrepl/zrepl/rpc/dataconn/frameconn.(Conn).ReadFrame(0xc00009b450, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/frameconn/frameconn.go:108 +0xff github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(Conn).readFrameFiltered(0xc000133340, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:115 +0x97 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(Conn).ReadFrame(...)         /home/mdtancsa/clean/zrepl/rpc/dataconn/heartbeatconn/heartbeatconn.go:110 github.com/zrepl/zrepl/rpc/dataconn/stream.readFrames(0xc00092f2c0, 0xc000c3a900, 0xc000133340)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream.go:227 +0xe5 github.com/zrepl/zrepl/rpc/dataconn/stream.(Conn).readFrames(0xc000133380)         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:64 +0x45 created by github.com/zrepl/zrepl/rpc/dataconn/stream.Wrap         /home/mdtancsa/clean/zrepl/rpc/dataconn/stream/stream_conn.go:51 +0x10c

goroutine 306913 [select]: os/exec.(Cmd).Start.func2(0xc0000ca160)         /usr/local/go/src/os/exec/exec.go:449 +0x91 created by os/exec.(Cmd).Start         /usr/local/go/src/os/exec/exec.go:448 +0x6cb

goroutine 307835 [chan receive]: github.com/zrepl/zrepl/zfs.ZFSRecv(0xd7f0b8, 0xc00088a040, 0xc000e0e320, 0x49, 0xc0004439c0, 0xd761f8, 0xc0004a0080, 0x100100, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/zfs/zfs.go:1144 +0x9ee github.com/zrepl/zrepl/endpoint.(Receiver).Receive(0xc000333a40, 0xd7f0b8, 0xc000be0f90, 0xc000468280, 0xd761d0, 0xc0001305d0, 0x0, 0x0, 0x0)         /home/mdtancsa/clean/zrepl/endpoint/endpoint.go:773 +0xcf7 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConnRequest(0xc0001b9b30, 0xd7f0b8, 0xc000be0db0, 0xc0004f89c8, 0x8, 0xc000682180)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:186 +0xc93 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConn.func2(0xd7f0b8, 0xc000be0db0)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:150 +0x59 github.com/zrepl/zrepl/daemon/job.(PassiveSide).Run.func1(0xd7f0b8, 0xc000be0a50, 0xd773f0, 0xc000a60500, 0xc000be09c0)         /home/mdtancsa/clean/zrepl/daemon/job/passive.go:178 +0x2c6 github.com/zrepl/zrepl/rpc.NewServer.func3(0xd7f0b8, 0xc000be0930, 0xd77418, 0xc000a604e0, 0xc000be09c0)         /home/mdtancsa/clean/zrepl/rpc/rpc_server.go:82 +0xd5 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConn(0xc0001b9b30, 0xc000a60480)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:149 +0x3bf github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func3(0xc00003bc10, 0xc0001b9b30, 0xc000a60480)         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:108 +0x5d created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve         /home/mdtancsa/clean/zrepl/rpc/dataconn/dataconn_server.go:106 +0x247

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-770277200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QKPCM7L3JJQ75JCQHLS4RU4RANCNFSM4WP73Z7A.

problame commented 3 years ago

@dmgk I'm pretty sure this is not related to zrepl. @mdtancsa is running a build with no unsafe code and AFAIK we don't have dependencies that use unsafe. The SIGSEGV is most likely a problem in the Go runtime. The crash is the same as your bug report as the one you posted for 0.3-rc1 on the corresponding zrepl issue. My analysis there holds for this stack trace as well.

@mdtancsa could you avoid pasting these large stack traces verbatim here? I think it hurts readability. Either write the reply through the web interface instead of email so that Markdown works and you can use

<details>
yourstacktrace surrounded by three backticks
</details>

or put them in an atttachment / uploaded file (if that works)

mdtancsa commented 3 years ago

Sorry about the formatting, after I sent, I realized I should have pasted as an attachment. With respect to the server / compiler / run time issue, this is FreeBSD  RELENG_12, r369150. Nothing unusual about the box.  Memory is all ECC. I dont see any errors registered either

0{backup4}# ipmitool sel list | tail -2  4da | 01/28/2021 | 11:48:06 | Watchdog2 #0xca | Timer interrupt () | Asserted  4db | 01/28/2021 | 11:48:40 | Watchdog2 #0xca | Power cycle () | Asserted 0{backup4}#

current binary compiled on this machine from lang/go-devel as updated by @dmgk https://github.com/dmgk

0{backup4}# ldd zrepl-jan30 zrepl-jan30:         libthr.so.3 => /lib/libthr.so.3 (0x801184000)         libc.so.7 => /lib/libc.so.7 (0x8011b1000) 0{backup4}# ls -l /lib/libc libc.so.7         libcam.so.7       libcasper.so.1    libcrypt.so.5     libcrypto.so.111  libctf.so.2       libcxxrt.so.1     0{backup4}# ls -l /lib/libc.so.7 -r--r--r--  1 root  wheel  schg 1955920 Jan 27 10:27 /lib/libc.so.7 0{backup4}# ls -l /lib/libthr.so.3 -r--r--r--  1 root  wheel  schg 131544 Jan 27 10:27 /lib/libthr.so.3 0{backup4}#

    ---Mike

On 1/31/2021 2:16 PM, Christian Schwarz wrote:

@dmgk https://github.com/dmgk I'm pretty sure this is not related to zrepl. @mdtancsa https://github.com/mdtancsa is running a build with no |unsafe| code and AFAIK we don't have dependencies that use unsafe. The |SIGSEGV| is most likely a problem in the Go runtime. The crash is the same as your bug report as the one you posted for 0.3-rc1 on the corresponding zrepl issue. My analysis there https://github.com/zrepl/zrepl/issues/411#issuecomment-770242917 holds for this stack trace as well.

@mdtancsa https://github.com/mdtancsa could you avoid pasting these large stack traces verbatim here? I think it hurts readability. Either write the reply through the web interface instead of email so that Markdown works and you can use

|

yourstacktrace surrounded by three backticks
|

or put them in an atttachment / uploaded file (if that works)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-770434427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QKKQC6JZ3NAHJNTZC3S4WUCFANCNFSM4WP73Z7A.

problame commented 3 years ago

So one difference between our two deployments is that I cross compile from Linux so it's a fully static binary. @dmgk how does this affect the Go commit that we are testing?

dmgk commented 3 years ago

@problame My guess is that cross compilation shouldn't affect anything, but I'm not very familiar with Go internals.

mdtancsa commented 3 years ago

Another crash. But still no deadlocks at least. Should I try compiling the server process with the regular release golang version without the patch in a day or two ?

crash2.txt

mdtancsa commented 3 years ago

@dmgk another crash. Any thoughts ? This was built with make GO_EXTRA_BUILDFLAGS=-race. Anything else I can add to the build process to help narrow down the issue ?

crash-feb3.txt

problame commented 3 years ago

There was no output from the race detector? (Would have been prior to the crash)

I would like to see what happens if you use a completely static binary. Build with CGO_ENABLED=0 and verify that the binary is static using ldd.

Please post go version, zrepl commit sha and maybe the build command + output with every crash, I'm starting to lose track of the configurations we already went through.

mdtancsa commented 3 years ago

No output from race detector on the crash. I also had a deadlock last night, also no output from race detector. Just the process hung (procstat of the process attached). Info about the binary in the attached text file.

I will try running with a static binary.

0{backup4}# gmake GO_EXTRA_BUILDFLAGS=-race CGO_ENABLED=0

GO111MODULE=on go build -mod=readonly -race -ldflags "-X github.com/zrepl/zrepl/version.zreplVersion=v0.3.1-11-g538382a" -o "artifacts/zrepl-freebsd-amd64" go build: -race requires cgo; enable cgo by setting CGO_ENABLED=1 gmake: *** [Makefile:226: zrepl-bin] Error 2 2{backup4}# gmake  CGO_ENABLED=0 GO111MODULE=on go build -mod=readonly  -ldflags "-X github.com/zrepl/zrepl/version.zreplVersion=v0.3.1-11-g538382a" -o "artifacts/zrepl-freebsd-amd64" 0{backup4}# cd artifacts/ 0{backup4}# ldd zrepl-freebsd-amd64 ldd: zrepl-freebsd-amd64: not a dynamic ELF executable 1{backup4}#

    ---Mike

On 2/4/2021 3:18 AM, Christian Schwarz wrote:

There was no output from the race detector? (Would have been prior to the crash)

I would like to see what happens if you use a completely static binary. Build with CGO_ENABLED=0 and verify that the binary is static using ldd.

Please post |go version|, zrepl commit sha and maybe the build command

  • output with every crash, I'm starting to lose track of the configurations we already went through.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-773121478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QIGME635S7PMRDGPFTS5JJ7DANCNFSM4WP73Z7A.

0{backup4}# ldd zrepl.feb3 zrepl.feb3: libthr.so.3 => /lib/libthr.so.3 (0x801658000) libc.so.7 => /lib/libc.so.7 (0x801685000) 0{backup4}# ./zrepl.feb3 version client: zrepl version=v0.3.1-11-g538382a go=go1.16rc1 GOOS=freebsd GOARCH=amd64 Compiler=gc

PID TID COMM TDNAME KSTACK
91938 100978 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 101057 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_timedwait_sig+0x14 _sleep+0x1b3 kern_clock_nanosleep+0x1d2 sys_nanosleep+0x3b amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102261 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102264 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102339 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102375 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102760 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de kern_wait6+0x58b sys_wait6+0x9f amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102807 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102823 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102910 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102954 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102971 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102973 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_timedwait_sig+0x14 _sleep+0x1b3 umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102979 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102984 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102991 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 102994 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 103001 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 103004 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 103022 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8 91938 103281 zrepl.feb3 - mi_switch+0xd4 sleepq_catch_signals+0x403 sleepq_wait_sig+0xf _sleep+0x1de umtxq_sleep+0x132 do_wait+0x42f umtx_op_wait_uint_private+0x54 sysumtx_op+0x7a amd64_syscall+0x387 fast_syscall_common+0xf8

problame commented 3 years ago

So I don't follow: which binary is dynamically linked and which statically linked? The ldd outputs you posted are contradictory.

mdtancsa commented 3 years ago

So I don't follow: which binary is dynamically linked and which statically linked? The ldd outputs you posted are contradictory.

The procstat output was from yesterday's binary that was dynamically linked with the race detector code. I recompiled this morning and am running with a statically linked binary now. However, I cannot have both a statically linked binary and the race detector code compiled in at the same time. I started the statically linked binary just a few hours ago.

problame commented 3 years ago

All right, thanks for clearing this up. I'm "looking forward" to the results ;)

problame commented 3 years ago

I just got another lock-up and found some interesting state in an instance of io.PipeWriter

(Go 1.16.1, zrepl 0.3.1)

> runtime.gopark() /home/cs/development/external/golang/go/src/runtime/proc.go:337 (PC: 0x43a805)
Warning: debugging optimized function
Frame 5: /home/cs/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:109 (PC: a28d76)
(dlv) p w
*io.PipeWriter {
        p: *io.pipe {
                wrMu: (*sync.Mutex)(0xc00027ad80),
                wrCh: (unreadable unreadable length: bad address),
                rdCh: chan int {
                        qcount: 7163367111982736752,
                        dataqsiz: 824262406179,
                        buf: *[824262406179]int [(unreadable bad address),(unreadable bad address),(unreadable bad address),(unreadable bad address),...+824262406175 more],
                        elemsize: 29539,
                        closed: 115,
                        elemtype: *runtime._type nil,
                        sendx: 4587366580439587226,
                        recvx: 5715683054218867322,
                        recvq: waitq<int> {
                                first: *(*sudog<int>)(0x544f),
                                last: *sudog<int> nil,},
                        sendq: waitq<int> {
                                first: *(*sudog<int>)(0x3fb999999999999a),
                                last: *(*sudog<int>)(0x73752f746f6f727a),},
                        lock: runtime.mutex {
                                lockRankStruct: runtime.lockRankStruct {},
                                key: 111520694939506,},},
                once: (*sync.Once)(0xc00027ad98),
                done: chan struct {} {},
                rerr: (*io.onceError)(0xc00027adb0),
                werr: (*io.onceError)(0xc00027adc8),},}

The (unreadable unreadable length: bad address) is really suspicious. @Go developers what does that mean?

problame commented 3 years ago

Also: the pipes `rdCh.closed count is 115. AFAICT it should be 0 or 1. All of this smells like a gc bug / use after free.

mdtancsa commented 3 years ago

Its too early to tell, but the binary I compiled statically is lasting longer without a crash or deadlock (24hrs so far).  Normally I would have had one or two by now.

version=v0.3.1-11-g538382a go=go1.16rc1 GOOS=freebsd GOARCH=amd64 Compiler=gc

Not sure if this is salient or not on FreeBSD, but the article referenced here ( https://news.ycombinator.com/item?id=25997506 ) talks about golang on OpenBSD bypassing libc APIs historically. Does golang on FreeBSD do this too ? If by compiling the app statically vs dynamically, is the app now bypassing one layer of API ? I hesitate to bring it up in case its way off base.

The other detail however that radically improves stability of the app is to pin it to one or two CPUs via cpuset.  It will last many weeks without a crash / deadlock if I do that.

    ---Mike

On 2/5/2021 4:33 AM, Christian Schwarz wrote:

Also: the pipes `rdCh.closed count is 115. AFAICT it should be 0 or 1. All of this smells like a gc bug / use after free.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/43873#issuecomment-773913005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMT4QODAQ67UWV6W2LN2ULS5O3NPANCNFSM4WP73Z7A.

problame commented 3 years ago

Does golang on FreeBSD do this too ? If by compiling the app statically vs dynamically, is the app now bypassing one layer of API ? I hesitate to bring it up in case its way off base.

AFAIK the static binaries do the syscall directly, not sure though.

The other detail however that radically improves stability of the app is to pin it to one or two CPUs via cpuset. It will last many weeks without a crash / deadlock if I do that.

Yes. With a single CPU a race condition in the runtime could only manifest through preemption (either through Go's scheduler or through the OS scheduler). Please don't do the cpuset limit if you can afford the annoyance of the frequent crashes, we shouldn't be contempt with this as a solution or even workaround.

mdtancsa commented 3 years ago

On 2/5/2021 9:52 AM, Christian Schwarz wrote:

AFAIK the static binaries do the syscall directly, not sure though.

The other detail however that radically improves stability of the
app is
to pin it to one or two CPUs via cpuset. It will last many weeks
without a crash / deadlock if I do that.

Yes. With a single CPU a race condition in the runtime could only manifest through preemption (either through Go's scheduler or through the OS scheduler). Please don't do the cpuset limit if you can afford the annoyance of the frequent crashes, we shouldn't be contempt with this as a solution or even workaround.

Correct. I am not interfering at all with its cpu selection / limitation.   The process and threads can use all cores, real and HT.  CPU is  Xeon(R) E-2236 CPU @ 3.40GHz, FreeBSD/SMP: 1 package(s) x 6 core(s) x 2 hardware threads

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND 49168 root         24    0   707M    73M uwait    6   1:39   2.24% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    5  10:32   1.89% zrepl.feb4{zrepl.feb4} 49168 root         21    0   707M    73M uwait   10   3:29   1.28% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    1   3:28   0.79% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    1   4:07   0.76% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    8   4:00   0.72% zrepl.feb4{zrepl.feb4} 49168 root         21    0   707M    73M uwait    0   3:16   0.71% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M kqread   5   3:52   0.71% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    2   3:48   0.67% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    3   3:42   0.61% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    1   3:55   0.30% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait   11   3:45   0.23% zrepl.feb4{zrepl.feb4} 49168 root         52    0   707M    73M uwait    9   0:22   0.16% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    9   3:33   0.12% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    9   3:37   0.11% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    4   3:55   0.08% zrepl.feb4{zrepl.feb4} 49168 root         21    0   707M    73M uwait   10   3:54   0.00% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait    9   3:41   0.00% zrepl.feb4{zrepl.feb4} 49168 root         21    0   707M    73M uwait    4   2:32   0.00% zrepl.feb4{zrepl.feb4} 49168 root         20    0   707M    73M uwait   11   0:00   0.00% zrepl.feb4{zrepl.feb4} 49168 root         21    0   707M    73M uwait    8   0:00   0.00% zrepl.feb4{zrepl.feb4}

mdtancsa commented 3 years ago

Another crash on the server process.

./zrepl.feb4 version

client: zrepl version=v0.3.1-11-g538382a go=go1.16rc1 GOOS=freebsd GOARCH=amd64 Compiler=gc

statically built

ldd zrepl.feb4

ldd: zrepl.feb4: not a dynamic ELF executable crash-feb5.txt

mdtancsa commented 3 years ago

Another crash. Same binary as before.

crashfeb6.txt

problame commented 3 years ago

I have an idea:

=> Let's see if it's related to some compiler optimization.

make GO_EXTRA_BUILDFLAGS=-gcflags=-N
mdtancsa commented 3 years ago

I have an idea:

  • 43920

  • 43941

  • 43942

=> Let's see if it's related to some compiler optimization.

make GO_EXTRA_BUILDFLAGS=-gcflags=-N

Running with this now

gmake CGO_ENABLED=0 GO_EXTRA_BUILDFLAGS=-gcflags=-N

GO111MODULE=on go build -mod=readonly -gcflags=-N -ldflags "-X github.com/zrepl/zrepl/version.zreplVersion=v0.3.1-11-g538382a" -o "artifacts/zrepl-freebsd-amd64"

cd artifacts/

ldd zrepl-freebsd-amd64

ldd: zrepl-freebsd-amd64: not a dynamic ELF executable

mdtancsa commented 3 years ago

I have an idea:

  • 43920

  • 43941

  • 43942

=> Let's see if it's related to some compiler optimization.

make GO_EXTRA_BUILDFLAGS=-gcflags=-N

Well, I dont know if it fixes the problem, but it certainly has let the process run a lot longer than it has to date!

ps -auxwwp 32356 -o lstart=

USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 32356 3.0 0.1 724264 81244 2 S+ Sat11 127:23.30 /usr/local/bin/z Sat Feb 6 11:37:11 2021

normally I would have seen a crash or deadlock by now. The process is running naturally across all cores, real and HT without issue for almost 2 days PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 32356 root 20 0 707M 79M uwait 1 18:54 0.78% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 10 6:50 0.20% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 8 6:44 0.20% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 7 6:39 0.20% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 11 1:02 0.20% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 6 7:11 0.10% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 7 6:44 0.10% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 5 6:41 0.10% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 3 7:02 0.00% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M kqread 5 7:01 0.00% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 3 7:00 0.00% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 0 6:47 0.00% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 9 6:44 0.00% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 9 6:39 0.00% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 1 6:29 0.00% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 4 6:25 0.00% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 8 6:25 0.00% zrepl.feb6{zrepl.feb6} 32356 root 22 0 707M 79M uwait 7 5:26 0.00% zrepl.feb6{zrepl.feb6} 32356 root 43 0 707M 79M uwait 2 0:44 0.00% zrepl.feb6{zrepl.feb6} 32356 root 20 0 707M 79M uwait 6 0:00 0.00% zrepl.feb6{zrepl.feb6} 32356 root 21 0 707M 79M uwait 10 0:00 0.00% zrepl.feb6{zrepl.feb6}

mdtancsa commented 3 years ago

Well, a hang and a crash. Tuesday is a very busy day CPU and disk wise on this server as I offload a copy to disks that go physically offsite. Still, it seemed to last a LOT longer than it normally would. The deadlock happened first after several days and then the crash a few hours later after the process was restarted.

crash-feb9.txt

deadlock-feb9.txt

problame commented 3 years ago

:/

The procstat kstack output is not very helpful. It would be very helfpul if you could attach using dlv and get a dump of all the goroutines (grs -t). Instructions on how to obtain dlv: https://github.com/go-delve/delve/blob/master/Documentation/installation/freebsd/install.md

problame commented 3 years ago

Also: If I understand correctly these were two separate occasions. Did they happen the same day? Same machine? Different machine?

problame commented 3 years ago

crash-feb9.txt

That's a runtime panic:

gmake CGO_ENABLED=0 GO_EXTRA_BUILDFLAGS=-gcflags=-N

wirep: p->m=824638963712(10) p->status=1
fatal error: wirep: invalid p state
wirep: p->m=0(0) p->status=3
fatal error: wirep: invalid p state
mdtancsa commented 3 years ago

Also: If I understand correctly these were two separate occasions. Did they happen the same day? Same machine? Different machine?

Same machine, same binary, but two separate occurrences. The hang happened early this AM, my script killed off the process and restarted it, and then a few hours later the runtime crash.
Is there a way to automate the dlv attach if there is a hang ? I am guessing not. OK, I will re-work it so the next time I get a hang, I will send myself an alert and attach to the process. I check to see if its working via its stats interface. shell script attached check-script.txt

problame commented 3 years ago

Is there a way to automate the dlv attach if there is a hang ?

mdtancsa commented 3 years ago

:/

The procstat kstack output is not very helpful. It would be very helfpul if you could attach using dlv and get a dump of all the goroutines (grs -t). Instructions on how to obtain dlv: https://github.com/go-delve/delve/blob/master/Documentation/installation/freebsd/install.md

OK, some progress (I think). My watcher script that restarts the daemon after a crash had a bug I introduced so it didnt restart after yet another runtime crash. The good news is since making the binary with (gmake CGO_ENABLED=0 GO_EXTRA_BUILDFLAGS=-gcflags=-N) , the crashes and hangs are fewer. (feb13-crash.txt)

As a number of clients needed to catch up, I was watching the load on all the threads. There really is not much CPU contention going on. It seems all contention for disk. And many of the threads are at zero CPU. But the more clients I woke up, the more threads got some level of busy. As I manually logged into to my servers and did a zrepl signal wakeup to make more and more of the server threads busy, I finally got to the point where the server process deadlocked. Not sure if it was a fluke or not and will try and reproduce again later after all the clients catch up. the capture of dlv-feb13.txt has that grs -t output It sort of makes sense in that if the server is under extra load like it is on Tuesdays, it seems to be more likely of a crash or hang

dlv-feb13.txt

feb13-crash.txt

here is the process right now. I watch it via top -p -H

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 81446 root 23 0 771M 100M uwait 5 0:12 3.46% zrepl.feb6{zrepl.feb6} 81446 root 21 0 771M 100M nanslp 11 0:25 2.63% zrepl.feb6{zrepl.feb6} 81446 root 21 0 771M 100M uwait 10 0:11 2.60% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M wait 2 0:11 1.96% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M kqread 9 0:04 1.75% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M wait 5 0:13 1.51% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M uwait 2 0:07 0.67% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M uwait 5 0:10 0.61% zrepl.feb6{zrepl.feb6} 81446 root 52 0 771M 100M uwait 4 0:03 0.04% zrepl.feb6{zrepl.feb6} 81446 root 23 0 771M 100M wait 9 0:12 0.02% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M wait 10 0:00 0.02% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M wait 1 0:12 0.02% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M wait 8 0:06 0.01% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M uwait 3 0:14 0.00% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M uwait 0 0:13 0.00% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M uwait 4 0:13 0.00% zrepl.feb6{zrepl.feb6} 81446 root 21 0 771M 100M uwait 4 0:12 0.00% zrepl.feb6{zrepl.feb6} 81446 root 23 0 771M 100M uwait 1 0:12 0.00% zrepl.feb6{zrepl.feb6} 81446 root 23 0 771M 100M uwait 5 0:12 0.00% zrepl.feb6{zrepl.feb6} 81446 root 21 0 771M 100M uwait 1 0:11 0.00% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M uwait 2 0:11 0.00% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M uwait 5 0:10 0.00% zrepl.feb6{zrepl.feb6} 81446 root 22 0 771M 100M uwait 4 0:05 0.00% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M uwait 4 0:00 0.00% zrepl.feb6{zrepl.feb6} 81446 root 20 0 771M 100M uwait 9 0:00 0.00% zrepl.feb6{zrepl.feb6}

Normally it looks like above with a while bunch of threads idle. But as I generated more load from manually waking up the clients, none were idle and I got the deadlock.

mdtancsa commented 3 years ago

deadlock again this AM. grs -t output attached

dlv-hang-feb14.txt

egonelbre commented 3 years ago

I noticed that the only freebsd specific thing in the stacks was sys_umtx_op and there was a fix to it in https://go-review.googlesource.com/c/go/+/276892 / https://github.com/golang/go/issues/43106. Maybe there are multiple causes for the failures?

Were these failures with go1.16rc1? Maybe Go 1.16rc1 without optimizations is a good thing to try?

problame commented 3 years ago

@egonelbre yes we use 1.16rc1 after ruling out that it's a regression in Go.

(See https://github.com/golang/go/issues/43873 on the Go issue tracker as well )

mdtancsa commented 3 years ago

I noticed that the only freebsd specific thing in the stacks was sys_umtx_op and there was a fix to it in https://go-review.googlesource.com/c/go/+/276892 / #43106. Maybe there are multiple causes for the failures?

Were these failures with go1.16rc1? Maybe Go 1.16rc1 without optimizations is a good thing to try? @egonelbre this was with golang from the ports tree lang/go-devel which is 1.16rc1 that @dmgk updated. I built with the binary with gmake CGO_ENABLED=0 GO_EXTRA_BUILDFLAGS=-gcflags=-N are there any other flags I should try building with ?

mdtancsa commented 3 years ago

@problame , another deadlock. Unless you need more examples, I am going to pin the process to two CPUs feb16.log which makes it more stable and long lasting

problame commented 3 years ago

Allright. I'm pretty pessimistic about this getting fixed any time soon since FreeBSD is not a tier 1 platform in Go. I referenced the issue from a bunch of other suspicious FreeBSD-related issues. I encounter the issue on my FreeBSD 12.2 system about every other day now so there's at least a relatively timely feedback loop should anyone have a proposal on how to make progress here :/

egonelbre commented 3 years ago

I think the major hurdle in making the problem actionable is making the problem easier to reproduce. I.e. trying to create something that fails within 1 minute of running the program.