Closed ltalirz closed 2 years ago
Could you try installing fuse-overlayfs from sources to check if it's the same problem than #130? Use the latest commit on the main
branch to get this commit: https://github.com/containers/fuse-overlayfs/commit/f87e1781a89a15e72b1cdd61b738ca64620fb702
Thanks a lot for the quick answer and sorry for the long delay.
It turns out that the issue I'm seeing is still present when using fuse-overlayfs
built from the commit you linked.
I've encountered a slightly different issue (perhaps related) on ubuntu 20.04.
The symptoms are the same (enroot start
hangs, enroot create
+ enroot start
works) except that now I have two different VMs that should be almost identical, and on one VM everything works fine, while on the other the enroot start
command hangs at
https://github.com/NVIDIA/enroot/blob/9c6e979059699e93cfc1cce0967b78e54ad0e263/src/runtime.sh#L82, both for regular users as well as for root
.
What I've tried:
ENROOT_
directories to a temporary directory with 777 permissionsENROOT_MOUNT_HOME=no
enroot-check_*.run --verify
on both VMs (same output [1])mount
between the two VMsapt list enroot
is enroot/now 3.4.0-1 amd64 [installed,local]
in both casesWhat could be a possible reason for causing enroot-mount
to hang in one machine but not in the other?
[1]
Kernel version:
Linux version 5.15.0-1017-azure (buildd@lcy02-amd64-032) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #20~20.04.1-Ubuntu SMP Fri Aug 5 12:16:53 UTC 2022
Kernel configuration:
CONFIG_NAMESPACES : OK
CONFIG_USER_NS : OK
CONFIG_SECCOMP_FILTER : OK
CONFIG_OVERLAY_FS : OK (module)
CONFIG_X86_VSYSCALL_EMULATION : OK
CONFIG_VSYSCALL_EMULATE : KO (required if glibc <= 2.13)
CONFIG_VSYSCALL_NATIVE : KO (required if glibc <= 2.13)
Kernel command line:
vsyscall=native : KO (required if glibc <= 2.13)
vsyscall=emulate : KO (required if glibc <= 2.13)
Kernel parameters:
kernel.unprivileged_userns_clone : OK
user.max_user_namespaces : OK
user.max_mnt_namespaces : OK
Extra packages:
nvidia-container-cli : OK
P.S. When adding an strace
to the enroot-mount
call, it hangs at
readlinkat(5, "run",
This likely corresponds to the point where enroot-mount
wants to create the following mount
tmpfs /var/run tmpfs x-create=dir,rw,nosuid,nodev,mode=755,slave 0 -1
We suspect it has something to do with the /run/users/10001
temporary folders created by pam_systemd
but it's still unclear what exactly is the problem.
In case anyone else runs into this:
it turns out that the issue mentioned above is caused by the azure operations management agent (auoms
).
We still did not quite figure out what it does exactly to cause the problem, but a workaround is to
sudo systemctl stop auoms
sudo systemctl mask auoms
sudo reboot
Note: simply stopping the service does not fix the issue, the reboot is necessary. Disabling the service is also not enough (it will be revived after a couple of minutes), it needs to be masked.
The proper mitigation is to upgrade from the deprecated OMSAgentForLinux to the newer AzureMonitorLinuxAgent (see "Extensions + applications" in the dashboard of your VM in the Azure portal for which agent your VM is running).
Although the original issue reported at the top may have a slightly different origin, I am closing this issue for now. Will reopen in case new information becomes available.
I am encountering an issue on CentOS 7.9 (
kernel=/boot/vmlinuz-3.10.0-1160.53.1.el7.x86_64
) with enroot 3.4.0 installed from the pre-compiled binaries.While the
enroot create => enroot start
approach works fine, theenroot start
approach only works if I amroot
. If instead I am an unprivileged user,enroot start
will hang [1]. I first thought it might be related to permissions on temporary directories but as far as I can tell this is not the case.To reproduce:
The symptom of hanging at
enroot start
seems similar to https://github.com/NVIDIA/enroot/issues/130 but the underlying issue may not be related.[1] While it is hanging, I will see
cc @matt-chan