Closed thaJeztah closed 3 years ago
@cpuguy83 @AkihiroSuda ptal
OK, looks like this is failing;
=== RUN TestRawReadWrite
raw_test.go:54:
Error Trace: raw_test.go:54
Error: An error is expected but got nil.
Test: TestRawReadWrite
Hmm... looks like there's some flakiness; re-pushed, and now it passes; could it be related to Go 1.14/ Go 1.15?
Ran Tibor's script from https://gist.github.com/tiborvass/eb0a4054679a43aaca22690a7c4452ed on this repository, in case it's useful;
@cpuguy83 @AkihiroSuda ptal
@AkihiroSuda any thoughts on the discussion above? https://github.com/containerd/fifo/pull/32#discussion_r578754229 (i.e.; would fixes be needed elsewhere (as well?)
@thaJeztah Did this make it into a release? I'm still seeing these issues occasionally on v20.10.12, with containerd 1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d.
I can't seem to find a way to resolve the issue without restarting the entire server, which is very costly.
Jan 27 11:05:55 cli[2633740]: time="2022-01-27T11:05:55.879409823+01:00" level=info msg="Starting up"
Jan 27 11:05:55 cli[2633740]: time="2022-01-27T11:05:55.879804137+01:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: 11513"
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.373259438+01:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: 11513"
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.375457580+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.375505639+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.375552415+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.375578423+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.379642047+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.379733149+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.379806034+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.379840347+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.760278807+01:00" level=warning msg="Your kernel does not support CPU realtime scheduler"
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.760346178+01:00" level=warning msg="Your kernel does not support cgroup blkio weight"
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.760361162+01:00" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jan 27 11:05:56 cli[2633740]: time="2022-01-27T11:05:56.760735829+01:00" level=info msg="Loading containers: start."
Jan 27 11:05:56 cli[2633740]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 27 11:05:56 cli[2633740]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x563e12aa9824]
Jan 27 11:05:56 cli[2633740]: goroutine 158 [running]:
Jan 27 11:05:56 cli[2633740]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0)
Jan 27 11:05:56 cli[2633740]: /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44
Jan 27 11:05:56 cli[2633740]: github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000e71230, 0xc000601528, 0xc0012bca58)
Jan 27 11:05:56 cli[2633740]: /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90
Jan 27 11:05:56 cli[2633740]: github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc00029d230, 0xc00072e148)
Jan 27 11:05:56 cli[2633740]: /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a
Jan 27 11:05:56 cli[2633740]: github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc000dc8380, 0x563e14df4128, 0xc000052028, 0xc0002e8040, 0x40, 0xc00029d220, 0xc001302d00, 0xffffffffffffffff, 0x0, 0x0, ...)
Jan 27 11:05:56 cli[2633740]: /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0xa07
Jan 27 11:05:56 cli[2633740]: github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc000164300, 0xc0007281e0, 0xc00000c1e0, 0xc0001642b8, 0xc000b71080, 0xc000b71050, 0xc000b71020, 0xc00019a280)
Jan 27 11:05:56 cli[2633740]: /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x46d
Jan 27 11:05:56 cli[2633740]: created by github.com/docker/docker/daemon.(*Daemon).restore
Jan 27 11:05:56 cli[2633740]: /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4cf
Not this was not in a release yet; https://github.com/containerd/fifo/compare/v1.0.0...39bc37d3b045af5a25ca2c62586d40b2cd1eb96c
relates to https://github.com/docker/for-linux/issues/1186
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue.
Trying to prevent a panic that was seen on container restore in th docker daemon:
If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation.