Open chevdor opened 1 year ago
I am testing further and the issue does not seem to be related to my funky folders but to:
-v /tmp:/tmp
I edited the description to remove the mapping that are not relevant.
When you trigger this error please run podman machine ssh journalctl -r
and then look for errors reported by conmon
or podman
.
This hopefully shows us some useful error message.
I did not see anything obvious but here is a dump: https://gist.github.com/chevdor/48913984195ec6962719c22765dd1b2f
Yeah doesn't show anything useful to me.
@mheon Any idea how --tty
could related to the /tmp mount? From the error it looks like conmon is just segfaulting?
Could it be that tty needs /tmp
in the podman machine for some reason and putting the user's host /tmp
on top of it does not end well.
I am personally using /tmp
a lot because it is "self organising" :) and also short to type.
If that's the case, an option would be to mount the machine /tmp
as something else such as /temp
and free /tmp
for the user. I did not check how the machineis defined but its /tmp
is probably using tmpfs and that may cause issue when binding it to non-tmpfs from the host.
Arguably the user could also use /temp
but this is counter-intuitive and the machine will be easier to educate than all the users.
If I had to guess, it would be related to logging - trying a container with --log-driver=none
will probably confirm that.
Let me try. I create a new machine now that I have one that works :)
podman machine init broken -v /Users:/Users -v /private:/private -v /var/folders:/var/folders -v /tmp:/tmp
Arf. Hitting another bug, possibly in podman-desktop
. The new machine somehow overwrote my working one...
I can reproduce the issue with podman run -t ubuntu echo "Hello"
.
The following fails the same way: podman run -t --log-driver=none ubuntu echo "Hello"
Could that bring ideas? It is the /tmp
of the machine with not bound the host:
podman machine ssh broken
Connecting to vm broken. To close connection, use `~.` or `exit`
Fedora CoreOS 38.20230414.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos
[core@localhost ~]$ cd /tmp
[core@localhost tmp]$ ll
total 0
drwx------. 3 root root 60 Apr 18 20:02 systemd-private-340e57b45f3a4b598317dc471638eb5a-chronyd.service-nBFmu3
drwx------. 3 root root 60 Apr 18 20:02 systemd-private-340e57b45f3a4b598317dc471638eb5a-dbus-broker.service-G4f7xv
drwx------. 3 root root 60 Apr 18 20:02 systemd-private-340e57b45f3a4b598317dc471638eb5a-rpm-ostreed.service-oMp8ZI
drwx------. 3 root root 60 Apr 18 20:02 systemd-private-340e57b45f3a4b598317dc471638eb5a-systemd-hostnamed.service-S8X5v8
drwx------. 3 root root 60 Apr 18 20:02 systemd-private-340e57b45f3a4b598317dc471638eb5a-systemd-logind.service-fAcYet
drwx------. 3 root root 60 Apr 18 20:02 systemd-private-340e57b45f3a4b598317dc471638eb5a-systemd-resolved.service-Qq7bQz
Now I can confirm that /tmp
is a tmpfs as expected:
[core@localhost tmp]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 4096 0 4096 0% /dev
tmpfs 1000404 84 1000320 1% /dev/shm
tmpfs 400164 5668 394496 2% /run
/dev/vda4 104266732 2280344 101986388 3% /sysroot
overlay 104266732 2280344 101986388 3% /usr
tmpfs 1000404 0 1000404 0% /tmp <----------
/dev/vda3 358271 103884 230631 32% /boot
tmpfs 200080 4 200076 1% /run/user/501
vol0 1953902844 1671874764 282028080 86% /Users
vol1 1953902844 1671874764 282028080 86% /private
vol2 1953902844 1671874764 282028080 86% /var/folders
I suggest NOT using podman-desktop
for those tests for now due to this issue.
If I had to guess, it would be related to logging - trying a container with --log-driver=none will probably confirm that.
As the reporter shows it only fails with --tty
logging should be the same regardless of --tty
set or not.
Following the code in conmon I found this: https://github.com/containers/conmon/blob/08c34bda8c75a37f153dfbd63399d22050551053/src/conn_sock.c#L170-L191
get_tmp_dir defaults to /tmp
so conmon tries to create to create a temporary socket under /tmp. https://docs.gtk.org/glib/func.get_tmp_dir.html
Does 9p filesystem mount support sockets? I think the socket call is failing but then I still do not understand why there is no error message from conmon in the journal.
Mapping
/private/tmp
does work fine though.
Are you using -v /private/tmp:/tmp
or -v /private/tmp:/private/tmp
, then?
The latter probably indicates nothing either way.
if the former works, and the two behave differently, that would suggest that we are mapping just the symlink over 9pfs. And in that case we are creating a socket inside the VM at /private/tmp/…
, and I can well imagine permissions, or SELinux, not being happy with that.
Does 9p filesystem mount support sockets?
https://github.com/torvalds/linux/blob/2d1bcbc6cd703e64caf8df314e3669b4786e008a/fs/9p/vfs_inode.c#L54-L55 suggests that it can, depending on options (and server support?).
Are you using -v /private/tmp:/tmp or -v /private/tmp:/private/tmp, then?
I use -v /private/tmp:/private/tmp
.
Your question brings a nice idea that would solve one other of my problems. I would love being able to use:
podman run --rm -it -v /tmp/mysite:/var/www/bla ningx
instead of my current:
podman run --rm -it -v /private-tmp/mysite:/var/www/bla ningx
But the problem described in this issue remains also when mapping -v /private/tmp:/tmp
since the issue is that the /tmp
of the Podman machine is not that tmp and the OS seem to rely on it to work properly, as a result, any mapping to /tmp
such as -v /whatever/folder:/tmp
will result in troubles.
Reproduced.
@Luap99 You were right, the failure is
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 7
fchmod(7, 0700) = 0
bind(7, {sa_family=AF_UNIX, sun_path="/tmp/conmon-term.JAIG51"}, 110) = -1 EOPNOTSUPP (Operation not supported)
write(2, "[conmon:e]: Failed to bind to co"..., 69) = 69
This child process is reporting the error.
But that report is not visible because https://github.com/containers/conmon/blob/08c34bda8c75a37f153dfbd63399d22050551053/src/conmon.c#L131 has redirected stderr to /dev/null
.
I’m sure there is some strategy for conmon error handling, so at this point I’d prefer t let conmon experts weigh in.
Regardless, I’d say that sharing /tmp
across machines is risky. Frequently enough processes tend to assume (without a very reason) that some names are exclusive to them, and to that machine (consider the hard-coded X11 socket paths, or perhaps something using PID files (with no machine ID) to disambiguate).
I don’t know how much effort it makes sense to spend on supporting this.
Issue Description
After reporting this issue, I tested with a default podman machine. In that case, the issue described below does NOT occur.
I do run into the issue when using a freshly created machine close to the default machine but with an extra mapping to
/tmp
:Mapping
/private/tmp
does work fine though.what works
what does not work
Steps to reproduce the issue
or
Describe the results you received
See description
Describe the results you expected
Using
--tty
or-t
works without error.podman info output
Podman in a container
No
Privileged Or Rootless
Tried both, appears not relevant.
Upstream Latest Release
Yes
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting