canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.34k stars 930 forks source link

lxc exec hangs with full sderr pipe #12536

Open p-ouellette opened 10 months ago

p-ouellette commented 10 months ago

Required information

Issue description

The following command hangs after executing for 46 minutes and 19 seconds.

lxc exec ub22test1 -- bash -c 'while true; do date >&2; sleep 1; done'

It only hangs when I run the command from a Jenkins job. The date command is blocked writing to stdout, which is connected to a pipe:

root@ub22test1:~# strace -p 1698183
strace: Process 1698183 attached
write(1, "Sun Nov 19 23:13:38 CST 2023\n", 29^Cstrace: Process 1698183 detached
 <detached ...>

root@ub22test1:~# ls -l /proc/1698183/fd
total 0
lr-x------ 1 root root 64 Nov 20 01:43 0 -> 'pipe:[236259458]'
l-wx------ 1 root root 64 Nov 20 01:43 1 -> 'pipe:[236259460]'
l-wx------ 1 root root 64 Nov 20 01:43 2 -> 'pipe:[236259460]'
root@ub22test1:~# lsof | grep 236259460
bash      1687587                          root    2w     FIFO               0,13       0t0  236259460 pipe
date      1698183                          root    1w     FIFO               0,13       0t0  236259460 pipe
date      1698183                          root    2w     FIFO               0,13       0t0  236259460 pipe

If I drain the pipe, the bash process resumes.

root@ub22test1:~# cat /proc/1698183/fd/1 >out
^C
root@ub22test1:~# ls -l out
-rw-r--r-- 1 root root 65482 Nov 20 01:53 out

Also, there is no hang if I remove the stderr redirection or use ssh instead of lxc exec as in the following commands:

lxc exec ub22test1 -- bash -c 'while true; do date; sleep 1; done'
ssh <ip> bash -c 'while true; do date >&2; sleep 1; done'

I'm not certain this is an LXD bug, but I can only reproduce it with lxc exec. The bash command does not hang when run outside of a container in Jenkins.

EDIT: it seems to be reproducible outside of Jenkins if I force non-interactive mode:

lxc exec -T ub22test2 -- bash -c 'while true; do date >&2; sleep 1; done'

The pipe eventually fills up and blocks the process.

Information to attach

tomponline commented 10 months ago

I think this sounds like expected behaviour. When using -T (which forces non-interactive mode) or in a non-interactive environment (like Jenkins CI) LXD will write output of stdout and stderr from the remote process to the associated channels on the local host where lxc is being run. If you are piping that output into a pipe that isn't being consumed then it will eventually fill up the buffer and stop.

This is so slow consumers don't miss output.

p-ouellette commented 10 months ago

If you are piping that output into a pipe that isn't being consumed then it will eventually fill up the buffer and stop.

But I don't think I am piping the output? This command hangs and I'm not piping it's output:

lxc exec -T ub22test2 -- bash -c 'while true; do date >&2; sleep 1; done'

The pipe that gets filled is a pipe used by LXD.