containers / toolbox

Tool for interactive command line environments on Linux
https://containertoolbx.org/
Apache License 2.0
2.48k stars 213 forks source link

delays caused by dbus in 0.0.99 #671

Open mkoura opened 3 years ago

mkoura commented 3 years ago

Describe the bug After updating my Fedora 33 to toolbox 0.0.99, I notice long start time (several seconds) of some applications, namely mc. When running with strace -T -o out.log mc, I can see

socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 <0.000024>
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0 <0.000007>
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0 <0.000007>
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [425984], [4]) = 0 <0.000005>
setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted) <0.000005>
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0 <0.000005>
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0 <0.000005>
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [425984], [4]) = 0 <0.000005>
setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted) <0.000005>
connect(3, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) = 0 <0.000018>
getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=65534, gid=65534}, [12]) = 0 <0.000006>
getsockopt(3, SOL_SOCKET, SO_PEERSEC, "system_u:system_r:system_dbusd_t"..., [64->36]) = 0 <0.000008>
getsockopt(3, SOL_SOCKET, SO_PEERGROUPS, 0x55958ed0de20, [256->0]) = 0 <0.000005>
fstat(3, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000005>
getsockopt(3, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 <0.000005>
getsockname(3, {sa_family=AF_UNIX}, [128->2]) = 0 <0.000006>
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH EXTERNAL\r\nDATA\r\n", iov_len=22}, {iov_base="NEGOTIATE_UNIX_FD\r\n", iov_len=19}, {iov_base="BEGIN\r\n", iov_len=7}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 48 <0.000012>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="DATA\r\nOK 62499daace1df01c0c0432b"..., iov_len=256}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 58 <0.000010>
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/fre"..., iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 128 <0.000009>
recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000005>
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=24, tv_nsec=999951000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=24, tv_nsec=999950159}) <0.000006>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\v\0\0\0\377\377\377\377?\0\0\0\5\1u\0\1\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24 <0.000006>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0"..., iov_len=67}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 67 <0.000012>
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\2\1 \0\0\0\2\0\0\0\242\0\0\0\1\1o\0\31\0\0\0/org/fre"..., iov_len=184}, {iov_base="\0\0\0\0\7\0\0\0TOOLBOX\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=32}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 216 <0.000011>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\v\0\0\0\377\377\377\377\217\0\0\0\7\1s\0\24\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24 <0.000006>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="org.freedesktop.DBus\0\0\0\0\6\1s\0\6\0\0\0"..., iov_len=147}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 147 <0.000007>
recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000005>
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999933000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=119, tv_nsec=1179715}) <0.998779>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1;\0\0\0\314\t\0\0^\0\0\0\5\1u\0\2\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24 <0.000014>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\6\1s\0\6\0\0\0:1.783\0\0\4\1s\0&\0\0\0org.free"..., iov_len=147}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 147 <0.000026>
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000011>
close(3)                                = 0 <0.000012>

repeated several times.

Notice almost one second delay caused by ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999933000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=119, tv_nsec=1179715}) <0.998779>

Steps how to reproduce the behaviour

  1. Run mc in up-to-date Fedora 33 toolbox running on up-to-date Fedora 33
  2. Notice the delay

Expected behaviour No delay when starting applications.

Output of toolbox --version (v0.0.90+) toolbox version 0.0.99

Toolbox package info (rpm -q toolbox) toolbox-0.0.99-1.fc33.x86_64

Output of podman version

Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.5
Built:        Tue Dec  8 15:37:50 2020
OS/Arch:      linux/amd64

Podman package info (rpm -q podman) podman-2.2.1-1.fc33.x86_64

Info about your OS Fedora Workstation 33

mkoura commented 3 years ago

Running with DBUS_SYSTEM_BUS_ADDRESS=unix:path=/nonexistent mc removes the delay so this confirms the issue is somehow caused by interaction with dbus.

debarshiray commented 3 years ago

I see that it's connecting to /run/dbus/system_bus_socket, and right before the ppoll, there's this:

recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000005>

I wonder if it was a mistake to offer the system D-Bus socket to rootless Toolbox containers.

Do you happen to know why mc is trying to use the system D-Bus instance?

mkoura commented 3 years ago

No idea, I'm not familiar with D-Bus enough to be able to speculate why they chose to use the system socket instead of the session one.

HarryMichal commented 3 years ago

Hi @mkoura! I wonder if this isn't somehow related to XWayland. The only thing Toolbox does with DBus is the exposure of the sockets. If the issue still persists, it might be worth comparing the startup speed between mc in a toolbox and mc on the host (which should be easy to do since you use Workstation).

mkoura commented 3 years ago

Hi! I don't think it's related to XWayland, I don't use Wayland (I have nvidia...). The startup speed difference between toolbox and host is ~ 5 seconds (measured with time and F10 as soon as UI appeared). I think the issue is that toolbox exposes the system D-Bus socket and applications can't use it from rootless container.