containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.99k stars 2.43k forks source link

podman deadlocks when used in both sides of a pipe #9212

Closed aojea closed 3 years ago

aojea commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman deadlocks when used in both sides of a pipe

Steps to reproduce the issue:

ok, this is interesting, there is a deadlock somewhere with podman, it hangs if I do

` sudo KIND_EXPERIMENTAL_PROVIDER=podman sh -c 'podman save localhost/agnhost:1 | ./kind load image-archive -'
using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
Copying blob 721384ec99e5 [>-------------------------------------] 162.5KiB / 4.1MiB
  86292 pts/5    S+     0:00  |   |   |   |   \_ sudo KIND_EXPERIMENTAL_PROVIDER=podman sh -c podman save localhost/agnhost:1 | ./kind load image-archive -
  86293 pts/5    S+     0:00  |   |   |   |       \_ sh -c podman save localhost/agnhost:1 | ./kind load image-archive -
  86294 pts/5    Sl+    0:00  |   |   |   |           \_ podman save localhost/agnhost:1
  86301 pts/5    Sl+    0:00  |   |   |   |               \_ podman ps -a --filter label=io.x-k8s.kind.cluster=kind --format {{.Names}}

but it also doesn´t allow me to execute new podman commands :thinking: it is possible that podman doesn´t allow concurrent calls or if this calls involves storage? @mrunalp @mheon

this is the strace of the hanging command, it only happens if I use sudo by the way

read(9, "", 1528)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 9, 0xc00040f28c) = 0
close(9)                                = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = 0
newfstatat(AT_FDCWD, "/usr/share/containers/libpod.conf", 0xc000481078, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/etc/containers/libpod.conf", 0xc000481148, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/share/containers/containers.conf", {st_mode=S_IFREG|0644, st_size=14893, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/containers/containers.conf", 0xc0004812e8, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/containers/containers.conf", O_RDONLY|O_CLOEXEC) = 9
epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2932330344, u64=140435478007656}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 9, 0xc00040f244) = -1 EPERM (Operation not permitted)
fstat(9, {st_mode=S_IFREG|0644, st_size=14893, ...}) = 0
read(9, "# The containers configuration f"..., 15405) = 14893
read(9, "", 512)                        = 0
close(9)                                = 0
geteuid()                               = 0
newfstatat(AT_FDCWD, "/etc/containers/storage.conf", {st_mode=S_IFREG|0644, st_size=7803, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/containers/storage.conf", {st_mode=S_IFREG|0644, st_size=7803, ...}, 0) = 0
futex(0xc00005c848, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc00005cbc8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc00005cbc8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55d9268a0508, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x55d9268a0508, FUTEX_WAIT_PRIVATE, 0, NULL

Describe the results you received:

commands hang, you can see that one side of the pipe is a podman save and in the other side it is using podman ps -a --filter ... to find a container where to save the image

\_ sh -c podman save localhost/agnhost:1 | ./kind load image-archive -
  86294 pts/5    Sl+    0:00  |   |   |   |           \_ podman save localhost/agnhost:1
  86301 pts/5    Sl+    0:00  |   |   |   |               \_ podman ps -a --filter label=io.x-k8s.kind.cluster=kind --format {{.Names}}

Describe the results you expected:

podman should not deadlock, and allow to be used on the cli with pipes

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Output of podman info --debug:

Package info (e.g. output of rpm -q podman or apt list podman):

Installed from https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_18.04/ /

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

xref: https://github.com/kubernetes-sigs/kind/pull/2041

vrothberg commented 3 years ago

Thanks for reaching out. I can reproduce on master:

podman (master) $ sudo ./bin/podman save alpine | sudo ./bin/podman ps
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
Copying blob ace0eda3e3be [>-------------------------------------] 163.0KiB / 5.6MiB
vrothberg commented 3 years ago

It works with podman load on the right side though

vrothberg commented 3 years ago

I can also reproduce as rootless

vrothberg commented 3 years ago

Okay, I think this cannot work. podman save alpine | expects the pipe to be read but podman ps does not do that. Hence's we're stuck on Copying blob ....

I think it's actually a livelock since save opened a stream for copying the layer but the stream isn't consumed. Only once the layer has been copied, the storage lock will be released.

@rhatdan @giuseppe PTAL to have a second pair of eyes.

giuseppe commented 3 years ago

isn't it equivalent to $ podman save alpine | sleep infinity? I am not sure we can do much about it

vrothberg commented 3 years ago

I concur. I am closing the issue but we can continue the conversation.

@aojea , where should the saved image go to?

rhatdan commented 3 years ago

Since we have a workaround and we really can not fix, I agree this shouldbe closed.

aojea commented 3 years ago

@vrothberg the image is piped into the containerd storage, using ctr --import -

vrothberg commented 3 years ago

@aojea, does that work?

aojea commented 3 years ago

I didn´t tried to do it directly, this is used by KIND to preload images on the cluster https://github.com/kubernetes-sigs/kind/pull/2041#issuecomment-768993462

So, basically it saves the output in the pipe, and in the other side of the pipe, it iterates through all the containers and copies the stream from stdin into ctr --import

vrothberg commented 3 years ago

Yes, that won't work. The data must be read immediately as it's holding the storage lock.