containerd / fifo

fifo pkg for Go
https://containerd.io
Apache License 2.0
87 stars 30 forks source link

fifo.Close(): prevent possible panic if fifo is nil #32

Closed thaJeztah closed 3 years ago

thaJeztah commented 3 years ago

relates to https://github.com/docker/for-linux/issues/1186

I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue.

Trying to prevent a panic that was seen on container restore in th docker daemon:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4]

goroutine 420 [running]:
github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44
github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8)
        /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90
github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040)
        /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a
github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...)
        /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923
github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280)
        /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a
created by github.com/docker/docker/daemon.(*Daemon).restore
        /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3

If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation.

thaJeztah commented 3 years ago

@cpuguy83 @AkihiroSuda ptal

thaJeztah commented 3 years ago

OK, looks like this is failing;

=== RUN   TestRawReadWrite
    raw_test.go:54: 
            Error Trace:    raw_test.go:54
            Error:          An error is expected but got nil.
            Test:           TestRawReadWrite
thaJeztah commented 3 years ago

Hmm... looks like there's some flakiness; re-pushed, and now it passes; could it be related to Go 1.14/ Go 1.15?

thaJeztah commented 3 years ago

Ran Tibor's script from https://gist.github.com/tiborvass/eb0a4054679a43aaca22690a7c4452ed on this repository, in case it's useful;

``` ./check-syscalls.sh Acct SYS_ACCT AddKey SYS_ADD_KEY Adjtimex SYS_ADJTIMEX Chdir SYS_CHDIR Chroot SYS_CHROOT ClockGetres SYS_CLOCK_GETRES ClockGettime SYS_CLOCK_GETTIME ClockNanosleep SYS_CLOCK_NANOSLEEP Close SYS_CLOSE CopyFileRange SYS_COPY_FILE_RANGE DeleteModule SYS_DELETE_MODULE Dup SYS_DUP Dup2 SYS_DUP2 Dup3 SYS_DUP3 EpollWait SYS_EPOLL_WAIT Eventfd SYS_EVENTFD2 Exit SYS_EXIT_GROUP Faccessat2 SYS_FACCESSAT2 Fadvise SYS_FADVISE64 Fallocate SYS_FALLOCATE FanotifyInit SYS_FANOTIFY_INIT Fchdir SYS_FCHDIR Fchmod SYS_FCHMOD Fchown SYS_FCHOWN Fchownat SYS_FCHOWNAT Fdatasync SYS_FDATASYNC Fgetxattr SYS_FGETXATTR FinitModule SYS_FINIT_MODULE Flistxattr SYS_FLISTXATTR Flock SYS_FLOCK Fremovexattr SYS_FREMOVEXATTR Fsetxattr SYS_FSETXATTR Fstat SYS_FSTAT Fstatat SYS_NEWFSTATAT Fstatfs SYS_FSTATFS Fsync SYS_FSYNC Ftruncate SYS_FTRUNCATE Getcwd SYS_GETCWD Getdents SYS_GETDENTS64 Getpriority SYS_GETPRIORITY Getrandom SYS_GETRANDOM Getxattr SYS_GETXATTR InitModule SYS_INIT_MODULE InotifyAddWatch SYS_INOTIFY_ADD_WATCH Ioperm SYS_IOPERM Iopl SYS_IOPL KeyctlBuffer SYS_KEYCTL KeyctlInt SYS_KEYCTL Klogctl SYS_SYSLOG Lchown SYS_LCHOWN Lgetxattr SYS_LGETXATTR Linkat SYS_LINKAT Listen SYS_LISTEN Listxattr SYS_LISTXATTR Llistxattr SYS_LLISTXATTR Lremovexattr SYS_LREMOVEXATTR Lsetxattr SYS_LSETXATTR Madvise SYS_MADVISE MemfdCreate SYS_MEMFD_CREATE Mkdirat SYS_MKDIRAT Mknodat SYS_MKNODAT Mlock SYS_MLOCK Mlockall SYS_MLOCKALL Mprotect SYS_MPROTECT Msync SYS_MSYNC Munlock SYS_MUNLOCK Munlockall SYS_MUNLOCKALL Nanosleep SYS_NANOSLEEP Pause SYS_PAUSE PerfEventOpen SYS_PERF_EVENT_OPEN PivotRoot SYS_PIVOT_ROOT Prctl SYS_PRCTL Pread SYS_PREAD64 ProcessVMReadv SYS_PROCESS_VM_READV ProcessVMWritev SYS_PROCESS_VM_WRITEV Pselect SYS_PSELECT6 Pwrite SYS_PWRITE64 Readlinkat SYS_READLINKAT Removexattr SYS_REMOVEXATTR Renameat SYS_RENAMEAT Renameat2 SYS_RENAMEAT2 RequestKey SYS_REQUEST_KEY Seek SYS_LSEEK Select SYS_SELECT Setdomainname SYS_SETDOMAINNAME Setfsgid SYS_SETFSGID Setfsuid SYS_SETFSUID Sethostname SYS_SETHOSTNAME Setns SYS_SETNS Setpriority SYS_SETPRIORITY Setxattr SYS_SETXATTR Shutdown SYS_SHUTDOWN Splice SYS_SPLICE Statfs SYS_STATFS Statx SYS_STATX Symlinkat SYS_SYMLINKAT Sync SYS_SYNC SyncFileRange SYS_SYNC_FILE_RANGE Syncfs SYS_SYNCFS Tee SYS_TEE Truncate SYS_TRUNCATE Unlinkat SYS_UNLINKAT Unmount SYS_UMOUNT2 Unshare SYS_UNSHARE Ustat SYS_USTAT Utime SYS_UTIME ```
thaJeztah commented 3 years ago

@cpuguy83 @AkihiroSuda ptal

thaJeztah commented 3 years ago

@AkihiroSuda any thoughts on the discussion above? https://github.com/containerd/fifo/pull/32#discussion_r578754229 (i.e.; would fixes be needed elsewhere (as well?)

Rid commented 2 years ago

@thaJeztah Did this make it into a release? I'm still seeing these issues occasionally on v20.10.12, with containerd 1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d.

I can't seem to find a way to resolve the issue without restarting the entire server, which is very costly.

Jan 27 11:05:55  cli[2633740]: time="2022-01-27T11:05:55.879409823+01:00" level=info msg="Starting up"
Jan 27 11:05:55  cli[2633740]: time="2022-01-27T11:05:55.879804137+01:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: 11513"
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.373259438+01:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: 11513"
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.375457580+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.375505639+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.375552415+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.375578423+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.379642047+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.379733149+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.379806034+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.379840347+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.760278807+01:00" level=warning msg="Your kernel does not support CPU realtime scheduler"
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.760346178+01:00" level=warning msg="Your kernel does not support cgroup blkio weight"
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.760361162+01:00" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jan 27 11:05:56  cli[2633740]: time="2022-01-27T11:05:56.760735829+01:00" level=info msg="Loading containers: start."
Jan 27 11:05:56  cli[2633740]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 27 11:05:56  cli[2633740]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x563e12aa9824]
Jan 27 11:05:56  cli[2633740]: goroutine 158 [running]:
Jan 27 11:05:56  cli[2633740]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0)
Jan 27 11:05:56  cli[2633740]:         /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44
Jan 27 11:05:56  cli[2633740]: github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000e71230, 0xc000601528, 0xc0012bca58)
Jan 27 11:05:56  cli[2633740]:         /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90
Jan 27 11:05:56  cli[2633740]: github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc00029d230, 0xc00072e148)
Jan 27 11:05:56  cli[2633740]:         /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a
Jan 27 11:05:56  cli[2633740]: github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc000dc8380, 0x563e14df4128, 0xc000052028, 0xc0002e8040, 0x40, 0xc00029d220, 0xc001302d00, 0xffffffffffffffff, 0x0, 0x0, ...)
Jan 27 11:05:56  cli[2633740]:         /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0xa07
Jan 27 11:05:56  cli[2633740]: github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc000164300, 0xc0007281e0, 0xc00000c1e0, 0xc0001642b8, 0xc000b71080, 0xc000b71050, 0xc000b71020, 0xc00019a280)
Jan 27 11:05:56  cli[2633740]:         /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x46d
Jan 27 11:05:56  cli[2633740]: created by github.com/docker/docker/daemon.(*Daemon).restore
Jan 27 11:05:56  cli[2633740]:         /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4cf
thaJeztah commented 2 years ago

Not this was not in a release yet; https://github.com/containerd/fifo/compare/v1.0.0...39bc37d3b045af5a25ca2c62586d40b2cd1eb96c