containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.04k stars 2.35k forks source link

podman hanging on a lot of subccommands without logs #9228

Closed b-ncMN closed 3 years ago

b-ncMN commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description Podman looks like it hangs on my system when typing those commands : podman auto-update podman build podman events podman exec podman images podman import podman info podman inspect podman load podman login podman logout podman mount podman pause podman port podman ps podman pull podman rmi podman run podman start podman stats podman top podman unpause podman unshare podman version podman wait

Steps to reproduce the issue: (Haven't tried to reproduce this anywhere else)

  1. Install opensuse 15.2
  2. run zypper in toolbox
  3. run toolbox -u or one of the few commands I mentioned
  4. observe

Describe the results you received: all those sub commands I mentioned hangs without printing logs inside in the journals

Describe the results you expected: I expected everything to work correctly, I originally experienced this bug while manually trying to run "podman pull registry.opensuse.org/opensuse/toolbox:latest" and then I noticed that podman wasn't working fine at all

Additional information you deem important (e.g. issue happens only occasionally): happens all the times, I have attempted a restart but this does not fix the issue. I have checked if I had virtualization enabled in my BIOS (which I indeed have, I also am able to run virtual machines using virsh successfully)

Output of podman version: podman version hangs but here is the version reported by zypper info podman :

Information for package podman:
-------------------------------
Repository     : Main Update Repository
Name           : podman
Version        : 2.1.1-lp152.4.6.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 93.6 MiB
Installed      : Yes
Status         : up-to-date
Source package : podman-2.1.1-lp152.4.6.1.src
Summary        : Daemon-less container engine for managing containers, pods and images
Description    :
    Podman is a container engine for managing pods, containers, and container
    images.
    It is a standalone tool and it directly manipulates containers without the need
    of a container engine daemon.
    Podman is able to interact with container images create in buildah, cri-o, and
    skopeo, as they all share the same datastore backend.

Output of podman info --debug: (hangs...)

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.1.1-lp152.4.6.1.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

No

Additional environment details (AWS, VirtualBox, physical, etc.): Physical (my laptop) OS/Distro : Linux/openSUSE leap 15.2 Kernel : Linux inftop 5.3.18-lp152.60-default #1 SMP Tue Jan 12 23:10:31 UTC 2021 (9898712) x86_64 x86_64 x86_64 GNU/Linux

b-ncMN commented 3 years ago

After a bit of playing around with podman, I noticed that running those commands as root seems to work, I am now wondering if this is an issue related to user permissions

Luap99 commented 3 years ago

Do you have the newuidmap and newgidmap binaries on your system installed? This sounds like https://github.com/containers/podman/issues/7890

b-ncMN commented 3 years ago

Yes, I have them installed.

I took advantage of the description of the issue you mentioned and did an strace : https://susepaste.org/35177576, what seems weird is that the issue you mentioned, mentions podman looping on some futex, which looks like it happens here too, after a few lines, strace indicates that it gets stuck at

wait4(2732, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=619}, ...}) = 2732

(line 2405 in the paste, the next lines are from me pressing ^C)

b-ncMN commented 3 years ago

I have tried to reproduce this on my deskop (which runs the same distro) and it works fine

rhatdan commented 3 years ago

Does this hang?

$ podman unshare cat /proc/self/uid_map

b-ncMN commented 3 years ago

Hi, sorry for the late answer,

yes it does

rhatdan commented 3 years ago

Well I have no idea what is going on. Have you tried to reboot this system, or have you tried this with a different account?

rhatdan commented 3 years ago

Is there anything special about your homedir? IE NFS based?

b-ncMN commented 3 years ago

Yes, I have tried rebooting (multiples times even), I haven't tried using another user, will report back on that.

I have a regular btrfs root partition with nothing special about my homedir

rhatdan commented 3 years ago

Perhaps this has something to do with btrfs.

@giuseppe Any ideas?

b-ncMN commented 3 years ago

it does run on another user.

vrothberg commented 3 years ago

Does buildah pull or skopeo copy docker://alpine containers-storage:alpine work?

rhatdan commented 3 years ago

Something in your homedir setup is causing this to fail.

b-ncMN commented 3 years ago

I haven't done anything particular in my home that could be causing this to fail, in fact I didn't even have podman installed before I had this bug, I got podman pulled when installing toolbox.

Is there a set of files / configs that are related to podman I could check ?

b-ncMN commented 3 years ago

I do not have have the buildah command accessible under neither of my users (test and infrandomness), yet pulling (and all the other subcommands) works under the test user, nor do I have the skopeo command

rhatdan commented 3 years ago

Could you rm -rf ~/.config/containers ~/.local/share/containers

And see if it still hangs.

rhatdan commented 3 years ago

Also could you show the output of printenv in your user account that does not work, perhaps there is some setting in environment that is causing this to hang.

b-ncMN commented 3 years ago

I tried to remove ~/.config/containers and ~/.local/share/containers, it didn't help

here's my env : https://susepaste.org/86410544

rhatdan commented 3 years ago

Does podman --log-level=debug info

Give you any information @giuseppe @mheon PTAL

mheon commented 3 years ago

Vague theory: remove anything with libpod in the name in /dev/shm in case there's a locking issue.

b-ncMN commented 3 years ago

podman --log-level=debug info

https://susepaste.org/32558978

mheon commented 3 years ago

DEBU[0000] error from newgidmap: newgidmap: gid range [1-65537) -> [100000-165536) not allowed

Interesting. Does your user have a valid entry in /etc/subgid?

b-ncMN commented 3 years ago

cat /etc/subgid

infrandomness:100000:65536
test:100000:65536

I wonder why I have the same numbers as test but it isn't working, even before the creating of this user account on my system.

rhatdan commented 3 years ago

Is newgidmap setuid or getcap?

$ getcap /usr/bin/newgidmap /usr/bin/newgidmap cap_setgid=ep $ ls -l /usr/bin/newgidmap -rwxr-xr-x. 1 root root 29848 Nov 16 04:17 /usr/bin/newgidmap

@saschagrunert @vrothberg Any ideas?

vrothberg commented 3 years ago

DEBU[0000] error from newgidmap: newgidmap: gid range [1-65537) -> [100000-165536) not allowed

The error message points to gid. @InfRandomness can you also share cat /etc/subgid?

b-ncMN commented 3 years ago

sudo getcap /usr/bin/newgidmap prints nothing

ls -l /usr/bin/newgidmap image

cat /etc/subgid

infrandomness:100000:65536
test:100000:65536

The fact that the path in the screenshot is red most likely has a special meaning.

rhatdan commented 3 years ago

Yes it means its is setuid (or has file caps) in this case it is setuid. Everything looks fine, but I have no idea why this is blowing up. Perhaps some setting in SUSE that blocks the use of the uid range.

Could you change the range to see if having duplicate ranges in the /etc/subuid is being rejected?

b-ncMN commented 3 years ago

Here are my new ranges :

cat /etc/subgid /etc/subuid

infrandomness:165536:65536
test:100000:65536
infrandomness:165536:65536
test:165536:65536

I logged out and logged back in after changing those and the issue is still happening

saschagrunert commented 3 years ago

Uh, I did not test it on Leap for quite a while. I think we have to debug it within a VM if it's reproducible.

b-ncMN commented 3 years ago

I can test it out in a VM a bit later on

b-ncMN commented 3 years ago

Unfortunately I wasn't able to reproduce this

b-ncMN commented 3 years ago

I think I'm just gonna reinstall my system and see how it goes after

b-ncMN commented 3 years ago

I think this issue can be closed now ever since it is coming from something in my home folder and not the binary itself

vrothberg commented 3 years ago

Thanks for the report and working with us, @InfRandomness !

b-ncMN commented 3 years ago

I am coming back to you guys because, since an update of podman I got not so long ago, it appears I've got more information about the situation :

"podman pull opensuse/tumbleweed" now stopped from hanging and prints : "Error: cannot setup namespace using newgidmap: exit status 1

I have also tried "buildah --debug unshare", here's what I get :

*DEBU running [buildah-in-a-user-namespace --debug unshare] with environment [LIBVA_DRIVER_NAME=iHD LS_COLORS=no=00:fi=00:di=01;34:ln=00;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=41;33;01:ex=00;32:*.cmd=00;32:*.exe=01;32:*.com=01;32:*.bat=01;32:*.btm=01;32:*.dll=01;32:*.tar=00;31:*.tbz=00;31:*.tgz=00;31:*.rpm=00;31:*.deb=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.lzma=00;31:*.zip=00;31:*.zoo=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.tb2=00;31:*.tz2=00;31:*.tbz2=00;31:*.xz=00;31:*.avi=01;35:*.bmp=01;35:*.dl=01;35:*.fli=01;35:*.gif=01;35:*.gl=01;35:*.jpg=01;35:*.jpeg=01;35:*.mkv=01;35:*.mng=01;35:*.mov=01;35:*.mp4=01;35:*.mpg=01;35:*.pcx=01;35:*.pbm=01;35:*.pgm=01;35:*.png=01;35:*.ppm=01;35:*.svg=01;35:*.tga=01;35:*.tif=01;35:*.webm=01;35:*.webp=01;35:*.wmv=01;35:*.xbm=01;35:*.xcf=01;35:*.xpm=01;35:*.aiff=00;32:*.ape=00;32:*.au=00;32:*.flac=00;32:*.m4a=00;32:*.mid=00;32:*.mp3=00;32:*.mpc=00;32:*.ogg=00;32:*.voc=00;32:*.wav=00;32:*.wma=00;32:*.wv=00;32: HOSTTYPE=x86_64 XDG_CONFIG_HOME=/home/infrandomness/.config XAUTHLOCALHOSTNAME=inftop LESSCLOSE=lessclose.sh %s %s XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB XDG_MENU_PREFIX=gnome- LANG=en_US.UTF-8 WINDOWMANAGER=gnome LESS=-M -I -R MANAGERPID=2790 DISPLAY=:0 JAVA_ROOT=/usr/lib64/jvm/java HOSTNAME=inftop INVOCATION_ID=1ec56de293f945168901ffca7e653025 ALACRITTY_LOG=/tmp/Alacritty-3479.log CONFIG_SITE=/usr/share/site/x86_64-unknown-linux-gnu CSHEDIT=emacs GTK2_MODULES=unity-gtk-module GPG_TTY=/dev/pts/0 AUDIODRIVER=pulseaudio LESS_ADVANCED_PREPROCESSOR=no COLORTERM=truecolor USERNAME=infrandomness JAVA_HOME=/usr/lib64/jvm/java ALSA_CONFIG_PATH=/etc/alsa-pulse.conf MACHTYPE=x86_64-suse-linux GIO_LAUNCHED_DESKTOP_FILE_PID=3479 GTK3_MODULES=unity-gtk-module SSH_AUTH_SOCK=/run/user/1000/keyring/ssh QEMU_AUDIO_DRV=pa MINICOM=-c on QT_SYSTEM_DIR=/usr/share/desktop-data OSTYPE=linux USER=infrandomness PAGER=less DESKTOP_SESSION=default MORE=-sl PWD=/home/infrandomness SSH_ASKPASS=/usr/lib/ssh/ssh-askpass HOME=/home/infrandomness JOURNAL_STREAM=9:45294 SSH_AGENT_PID=3039 HOST=inftop XNLSPATH=/usr/share/X11/nls XDG_SESSION_TYPE=x11 SDK_HOME=/usr/lib64/jvm/java XDG_DATA_DIRS=/home/infrandomness/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share JDK_HOME=/usr/lib64/jvm/java XDG_SESSION_DESKTOP=default PROFILEREAD=true GJS_DEBUG_OUTPUT=stderr GTK_MODULES=canberra-gtk-module FROM_HEADER= MAIL=/var/spool/mail/infrandomness UBUNTU_MENUPROXY=1 WINDOWPATH=2 LESSKEY=/etc/lesskey.bin TERM=xterm-256color SHELL=/bin/bash QT_IM_MODULE=xim XMODIFIERS=@im=local LS_OPTIONS=-N --color=tty -T 0 XCURSOR_THEME=DMZ XDG_CURRENT_DESKTOP=GNOME GIO_LAUNCHED_DESKTOP_FILE=/home/infrandomness/.local/share/applications/Alacritty.desktop PYTHONSTARTUP=/etc/pythonstart SHLVL=1 G_FILENAME_ENCODING=@locale,UTF-8,ISO-8859-15,CP1252 MANPATH=/usr/local/man:/usr/local/share/man:/usr/share/man WINDOWID=37748738 XSESSION_IS_UP=yes GDMSESSION=default LOGNAME=infrandomness DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus XDG_RUNTIME_DIR=/run/user/1000 XAUTHORITY=/run/user/1000/gdm/Xauthority JRE_HOME=/usr/lib64/jvm/java XDG_CONFIG_DIRS=/etc/xdg PATH=/home/infrandomness/.cargo/bin:/home/infrandomness/bin:/usr/local/bin:/usr/bin:/bin:usr/local/bin:usr/local/bin JAVA_BINDIR=/usr/lib64/jvm/java/bin SDL_AUDIODRIVER=pulse QT_IM_SWITCHER=imsw-multi G_BROKEN_FILENAMES=1 HISTSIZE=1000 GJS_DEBUG_TOPICS=JS ERROR;JS LOG SESSION_MANAGER=local/inftop:@/tmp/.ICE-unix/3087,unix/inftop:/tmp/.ICE-unix/3087 CPU=x86_64 CVS_RSH=ssh LESSOPEN=lessopen.sh %s GTK_IM_MODULE=cedilla _=/usr/bin/buildah TMPDIR=/var/tmp _CONTAINERS_USERNS_CONFIGURED=1 BUILDAH_ISOLATION=rootless], UID map [{ContainerID:0 HostID:1000 Size:1} {ContainerID:1 HostID:100000 Size:65536}], and GID map [{ContainerID:0 HostID:100 Size:1} {ContainerID:1 HostID:100000 Size:65536}]
WARN error running newgidmap: exit status 1: newgidmap: gid range [1-65537) -> [100000-165536) not allowed
WARN falling back to single mapping

The last two warnings looks interesting to me :thinking:

rhatdan commented 3 years ago

as the logged in user do $ cat /proc/self/uid_map 0 0 4294967295