Closed randombenj closed 1 year ago
Please can you get the lxc info NAME --show-log
output after the timeout has occurred, along with lxc monitor --type=logging --pretty
output from the point where the lxc rm
command is started.
Interesting observation, when running lxc rm -f c
the second time (after the timeout) the container is
deleted.
$ lxc info c --show-log
Name: c
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2023/03/01 15:19 CET
Last Used: 2023/03/01 15:19 CET
Log:
LXC Monitor:
DEBUG [2023-03-01T15:38:45+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0 username=fassbinb
DEBUG [2023-03-01T15:38:45+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/instances/c username=fassbinb
DEBUG [2023-03-01T15:38:45+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/instances/c username=fassbinb
DEBUG [2023-03-01T15:38:45+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/events username=fassbinb
DEBUG [2023-03-01T15:38:45+01:00] Event listener server handler started id=5c1e5fce-bcef-40aa-8d1e-241dd4f97820 local=/var/lib/lxd/unix.socket remote=@
DEBUG [2023-03-01T15:38:45+01:00] Handling API request ip=@ method=PUT protocol=unix url=/1.0/instances/c/state username=fassbinb
DEBUG [2023-03-01T15:38:45+01:00] Stop started instance=c instanceType=container project=default stateful=false
DEBUG [2023-03-01T15:38:45+01:00] Started operation class=task description="Stopping instance" operation=63c368fa-cb5f-43ab-8202-0158795f58e2 project=default
DEBUG [2023-03-01T15:38:45+01:00] New operation class=task description="Stopping instance" operation=63c368fa-cb5f-43ab-8202-0158795f58e2 project=default
DEBUG [2023-03-01T15:38:45+01:00] Instance operation lock created action=stop instance=c project=default reusable=false
INFO [2023-03-01T15:38:45+01:00] Stopping container action=stop created="2023-03-01 14:37:43.34485976 +0000 UTC" ephemeral=false instance=c instanceType=container project=default stateful=false used="2023-03-01 14:37:45.957676119 +0000 UTC"
DEBUG [2023-03-01T15:38:45+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/operations/63c368fa-cb5f-43ab-8202-0158795f58e2 username=fassbinb
DEBUG [2023-03-01T15:38:46+01:00] Scheduler: network: eth0 has been added: updating network priorities
DEBUG [2023-03-01T15:38:46+01:00] Handling API request ip=@ method=GET protocol=unix url="/internal/containers/c/onstopns?netns=%2Fproc%2F2459072%2Ffd%2F49&project=default&target=stop" username=root
DEBUG [2023-03-01T15:38:46+01:00] Stopping device device=eth0 instance=c instanceType=container project=default type=nic
DEBUG [2023-03-01T15:39:15+01:00] Instance operation lock finished action=stop err="Instance \"stop\" operation timed out after 30s" instance=c project=default reusable=false
DEBUG [2023-03-01T15:41:42+01:00] Handling API request ip=@ method=GET protocol=unix url="/internal/containers/c/onstop?project=default&target=stop" username=root
DEBUG [2023-03-01T15:41:42+01:00] Instance operation lock created action=stop instance=c project=default reusable=false
DEBUG [2023-03-01T15:41:42+01:00] Instance initiated stop action=stop instance=c instanceType=container project=default
DEBUG [2023-03-01T15:41:42+01:00] Container stopped, cleaning up instance=c instanceType=container project=default
DEBUG [2023-03-01T15:41:42+01:00] Stopping device device=root instance=c instanceType=container project=default type=disk
DEBUG [2023-03-01T15:41:42+01:00] UnmountInstance started instance=c project=default
DEBUG [2023-03-01T15:41:42+01:00] UnmountInstance finished instance=c project=default
DEBUG [2023-03-01T15:41:42+01:00] Failure for operation class=task description="Stopping instance" err="Instance \"stop\" operation timed out after 30s" operation=63c368fa-cb5f-43ab-8202-0158795f58e2 project=default
DEBUG [2023-03-01T15:41:42+01:00] Stop finished instance=c instanceType=container project=default stateful=false
DEBUG [2023-03-01T15:41:42+01:00] Event listener server handler stopped listener=5c1e5fce-bcef-40aa-8d1e-241dd4f97820 local=/var/lib/lxd/unix.socket remote=@
DEBUG [2023-03-01T15:41:43+01:00] Scheduler: container c stopped: re-balancing
DEBUG [2023-03-01T15:41:43+01:00] Instance operation lock finished action=stop err="<nil>" instance=c project=default reusable=false
INFO [2023-03-01T15:41:43+01:00] Shut down container action=stop created="2023-03-01 14:37:43.34485976 +0000 UTC" ephemeral=false instance=c instanceType=container project=default stateful=false used="2023-03-01 14:37:45.957676119 +0000 UTC"
Log of the second lxc rm -f c
monitor:
DEBUG [2023-03-01T15:42:50+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0 username=fassbinb
DEBUG [2023-03-01T15:42:50+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/instances/c username=fassbinb
DEBUG [2023-03-01T15:42:50+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/instances/c username=fassbinb
DEBUG [2023-03-01T15:42:50+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/events username=fassbinb
DEBUG [2023-03-01T15:42:50+01:00] Event listener server handler started id=5c49d364-9433-46a9-b8fd-30c46434d705 local=/var/lib/lxd/unix.socket remote=@
DEBUG [2023-03-01T15:42:50+01:00] Handling API request ip=@ method=DELETE protocol=unix url=/1.0/instances/c username=fassbinb
DEBUG [2023-03-01T15:42:50+01:00] Instance operation lock created action=delete instance=c project=default reusable=false
DEBUG [2023-03-01T15:42:50+01:00] New operation class=task description="Deleting instance" operation=67361a29-c132-4f22-b3d9-71c7bb06bb6c project=default
DEBUG [2023-03-01T15:42:50+01:00] Started operation class=task description="Deleting instance" operation=67361a29-c132-4f22-b3d9-71c7bb06bb6c project=default
INFO [2023-03-01T15:42:50+01:00] Deleting container created="2023-03-01 14:37:43.34485976 +0000 UTC" ephemeral=false instance=c instanceType=container project=default used="2023-03-01 14:37:45.957676119 +0000 UTC"
DEBUG [2023-03-01T15:42:50+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/operations/67361a29-c132-4f22-b3d9-71c7bb06bb6c username=fassbinb
DEBUG [2023-03-01T15:42:50+01:00] DeleteInstance started instance=c project=default
DEBUG [2023-03-01T15:42:50+01:00] Deleting instance volume instance=c project=default volName=c
DEBUG [2023-03-01T15:42:50+01:00] DeleteInstance finished instance=c project=default
DEBUG [2023-03-01T15:42:50+01:00] Removing device device=root instance=c instanceType=container project=default type=disk
DEBUG [2023-03-01T15:42:50+01:00] Removing device device=eth0 instance=c instanceType=container project=default type=nic
DEBUG [2023-03-01T15:42:50+01:00] Success for operation class=task description="Deleting instance" operation=67361a29-c132-4f22-b3d9-71c7bb06bb6c project=default
DEBUG [2023-03-01T15:42:50+01:00] Instance operation lock finished action=delete err="<nil>" instance=c project=default reusable=false
INFO [2023-03-01T15:42:50+01:00] Deleted container created="2023-03-01 14:37:43.34485976 +0000 UTC" ephemeral=false instance=c instanceType=container project=default used="2023-03-01 14:37:45.957676119 +0000 UTC"
DEBUG [2023-03-01T15:42:50+01:00] Event listener server handler stopped listener=5c49d364-9433-46a9-b8fd-30c46434d705 local=/var/lib/lxd/unix.socket remote=@
Hi,
Thanks for this.
Please can you confirm if this is still in an issue in LXD 5.11, as LXD 5.9 isn't supported anymore.
Can confirm for installed: 5.11-ad0b61e
needed to add security.nesting=true
lxc rm -f container
Error: Stopping the instance failed: Failed stopping instance, status is "Running": Instance "stop" operation timed out after 30s
also from dmesg
[199564.457146] CIFS: VFS: \\CFIS-Server has not responded in 180 seconds. Reconnecting...
[199564.461730] CIFS: VFS: \\CFIS-Server has not responded in 180 seconds. Reconnecting...
How long do you have to wait if you do lxc stop c
(without -f), also can you show the debug monitor output for this too.
I would also like to see the LXD 5.11 monitor debug output for lxc stop -f c
as some of the messages have changed since LXD 5.9.
time lxc stop -f c
Error: Failed stopping instance, status is "Running": Instance "stop" operation timed out after 30s
Try `lxc info --show-log c` for more info
lxc stop -f c 0.06s user 0.09s system 0% cpu 2:31.84 total
and logs
DEBUG [2023-03-01T16:40:14+01:00] Event listener server handler started id=48fdf5d0-9998-4469-b1f8-2693fc6432c0 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG [2023-03-01T16:40:17+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0 username="uid=81448"
DEBUG [2023-03-01T16:40:17+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/events username="uid=81448"
DEBUG [2023-03-01T16:40:17+01:00] Event listener server handler started id=2ec7a3fe-2231-4604-9eba-6867a99aec07 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG [2023-03-01T16:40:17+01:00] Handling API request ip=@ method=PUT protocol=unix url=/1.0/instances/c/state username="uid=81448"
DEBUG [2023-03-01T16:40:17+01:00] New operation class=task description="Stopping instance" operation=8dd7a7fe-c06b-45b2-b53a-584737d4e067 project=default
DEBUG [2023-03-01T16:40:17+01:00] Stop started instance=c instanceType=container project=default stateful=false
DEBUG [2023-03-01T16:40:17+01:00] Started operation class=task description="Stopping instance" operation=8dd7a7fe-c06b-45b2-b53a-584737d4e067 project=default
DEBUG [2023-03-01T16:40:17+01:00] Instance operation lock created action=stop instance=c project=default reusable=false
INFO [2023-03-01T16:40:17+01:00] Stopping instance action=stop created="2023-03-01 15:37:38.876997917 +0000 UTC" ephemeral=false instance=c instanceType=container project=default stateful=false used="2023-03-01 15:37:39.556369868 +0000 UTC"
DEBUG [2023-03-01T16:40:17+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/operations/8dd7a7fe-c06b-45b2-b53a-584737d4e067 username="uid=81448"
DEBUG [2023-03-01T16:40:17+01:00] Scheduler: network: eth0 has been added: updating network priorities
DEBUG [2023-03-01T16:40:17+01:00] Handling API request ip=@ method=GET protocol=unix url="/internal/containers/c/onstopns?netns=%2Fproc%2F1239387%2Ffd%2F49&project=default&target=stop" username=root
DEBUG [2023-03-01T16:40:17+01:00] Instance operation lock inherited for stop action=stop instance=c instanceType=container project=default
DEBUG [2023-03-01T16:40:17+01:00] Stopping device device=eth0 instance=c instanceType=container project=default type=nic
DEBUG [2023-03-01T16:40:47+01:00] Instance operation lock finished action=stop err="Instance \"stop\" operation timed out after 30s" instance=c project=default reusable=false
DEBUG [2023-03-01T16:42:48+01:00] Handling API request ip=@ method=GET protocol=unix url="/internal/containers/c/onstop?project=default&target=stop" username=root
DEBUG [2023-03-01T16:42:48+01:00] Instance initiated stop action=stop instance=c instanceType=container project=default
DEBUG [2023-03-01T16:42:48+01:00] Instance operation lock created action=stop instance=c project=default reusable=false
DEBUG [2023-03-01T16:42:48+01:00] Instance stopped, cleaning up instance=c instanceType=container project=default
DEBUG [2023-03-01T16:42:48+01:00] Stopping device device=root instance=c instanceType=container project=default type=disk
DEBUG [2023-03-01T16:42:48+01:00] Stop finished instance=c instanceType=container project=default stateful=false
DEBUG [2023-03-01T16:42:48+01:00] Failure for operation class=task description="Stopping instance" err="Failed stopping instance, status is \"Running\": Instance \"stop\" operation timed out after 30s" operation=8dd7a7fe-c06b-45b2-b53a-584737d4e067 project=default
DEBUG [2023-03-01T16:42:48+01:00] UnmountInstance started instance=c project=default
DEBUG [2023-03-01T16:42:48+01:00] Event listener server handler stopped listener=2ec7a3fe-2231-4604-9eba-6867a99aec07 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG [2023-03-01T16:42:49+01:00] UnmountInstance finished instance=c project=default
DEBUG [2023-03-01T16:42:49+01:00] Unmounted ZFS dataset dev=default/containers/c driver=zfs path=/var/snap/lxd/common/lxd/storage-pools/default/containers/c pool=default volName=c
DEBUG [2023-03-01T16:42:49+01:00] Scheduler: container c stopped: re-balancing
INFO [2023-03-01T16:42:49+01:00] Shut down instance action=stop created="2023-03-01 15:37:38.876997917 +0000 UTC" ephemeral=false instance=c instanceType=container project=default stateful=false used="2023-03-01 15:37:39.556369868 +0000 UTC"
DEBUG [2023-03-01T16:42:49+01:00] Instance operation lock finished action=stop err="<nil>" instance=c project=default reusable=false
time lxc stop c
lxc stop c 0.07s user 0.08s system 10% cpu 1.399 total
DEBUG [2023-03-01T16:45:43+01:00] Event listener server handler started id=659a41ab-2e6f-4ce6-9378-b6dc742d338c local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG [2023-03-01T16:45:58+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0 username="uid=81448"
DEBUG [2023-03-01T16:45:58+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/events username="uid=81448"
DEBUG [2023-03-01T16:45:58+01:00] Event listener server handler started id=a3e849ae-3e9a-4103-95cb-2b17f4850e08 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG [2023-03-01T16:45:58+01:00] Handling API request ip=@ method=PUT protocol=unix url=/1.0/instances/c/state username="uid=81448"
DEBUG [2023-03-01T16:45:58+01:00] Started operation class=task description="Stopping instance" operation=ea60e31d-5a15-4502-a774-18034edf9c78 project=default
DEBUG [2023-03-01T16:45:58+01:00] New operation class=task description="Stopping instance" operation=ea60e31d-5a15-4502-a774-18034edf9c78 project=default
DEBUG [2023-03-01T16:45:58+01:00] Shutdown started instance=c instanceType=container project=default timeout=-1s
DEBUG [2023-03-01T16:45:58+01:00] Instance operation lock created action=stop instance=c project=default reusable=true
INFO [2023-03-01T16:45:58+01:00] Shutting down instance action=shutdown created="2023-03-01 15:44:32.68628724 +0000 UTC" ephemeral=false instance=c instanceType=container project=default timeout=-1s used="2023-03-01 15:44:33.396579726 +0000 UTC"
DEBUG [2023-03-01T16:45:58+01:00] Shutdown request sent to instance instance=c instanceType=container project=default
DEBUG [2023-03-01T16:45:58+01:00] Handling API request ip=@ method=GET protocol=unix url=/1.0/operations/ea60e31d-5a15-4502-a774-18034edf9c78 username="uid=81448"
DEBUG [2023-03-01T16:45:58+01:00] Scheduler: network: eth0 has been added: updating network priorities
DEBUG [2023-03-01T16:45:58+01:00] Handling API request ip=@ method=GET protocol=unix url="/internal/containers/c/onstopns?netns=%2Fproc%2F1245477%2Ffd%2F49&project=default&target=stop" username=root
DEBUG [2023-03-01T16:45:58+01:00] Instance operation lock inherited for stop action=stop instance=c instanceType=container project=default
DEBUG [2023-03-01T16:45:58+01:00] Stopping device device=eth0 instance=c instanceType=container project=default type=nic
DEBUG [2023-03-01T16:45:59+01:00] Handling API request ip=@ method=GET protocol=unix url="/internal/containers/c/onstop?project=default&target=stop" username=root
DEBUG [2023-03-01T16:45:59+01:00] Instance operation lock inherited for stop action=stop instance=c instanceType=container project=default
DEBUG [2023-03-01T16:45:59+01:00] Instance stopped, cleaning up instance=c instanceType=container project=default
DEBUG [2023-03-01T16:45:59+01:00] Stopping device device=root instance=c instanceType=container project=default type=disk
DEBUG [2023-03-01T16:45:59+01:00] UnmountInstance started instance=c project=default
DEBUG [2023-03-01T16:45:59+01:00] UnmountInstance finished instance=c project=default
DEBUG [2023-03-01T16:45:59+01:00] Unmounted ZFS dataset dev=default/containers/c driver=zfs path=/var/snap/lxd/common/lxd/storage-pools/default/containers/c pool=default volName=c
DEBUG [2023-03-01T16:45:59+01:00] Scheduler: container c stopped: re-balancing
DEBUG [2023-03-01T16:45:59+01:00] Instance operation lock finished action=stop err="<nil>" instance=c project=default reusable=true
DEBUG [2023-03-01T16:45:59+01:00] Shutdown finished instance=c instanceType=container project=default timeout=-1s
DEBUG [2023-03-01T16:45:59+01:00] Success for operation class=task description="Stopping instance" operation=ea60e31d-5a15-4502-a774-18034edf9c78 project=default
DEBUG [2023-03-01T16:45:59+01:00] Event listener server handler stopped listener=a3e849ae-3e9a-4103-95cb-2b17f4850e08 local=/var/snap/lxd/common/lxd/unix.socket remote=@
@tomponline this issue is still labeled as incomplete what else is missing?
@roosterfish want to take a look at this one when you get chance?
My hypothesis that this is a consequence of network disabling during shutdown. We shooting in our own leg here, network is down, then we want do destroy mount namespace, to do that we need to unmount all the mounts but CIFS uses network and can't be properly unmounted because network is not accessible anymore.
@fliiiix can you reproduce the problem and show the output of this thing ps aux | awk '{ if ($8 == "D") { print } }'
? If you see non-empty output then take a numeric value from a second column (PID) and also post cat /proc/<PID>/stack
output here.
@mihalicyn I reproduced the issue. The command ps aux | awk '{ if ($8 == "D") { print } }'
unfortunately doesn't show anything on the host during the deletion of the container.
What I can also see are the dmesg messages as reported by @fliiiix:
[12975.341511] CIFS: VFS: \\10.159.171.106 has not responded in 180 seconds. Reconnecting...
@roosterfish it seems interesting. Can you post the output of cat /proc/*/stack
somewhere like pastebin/gists? This output should be collected when you've lxc stop
process running and stuck.
@roosterfish it seems interesting. Can you post the output of
cat /proc/*/stack
somewhere like pastebin/gists? This output should be collected when you'velxc stop
process running and stuck.
I have collected a few (every second). Here is one from the beginning right after lxc delete -f
was called:
And this one is a few seconds before lxc delete -f
returns:
O! Thanks, Julian!
Now let's see:
[<0>] wait_for_response+0xb0/0x150 [cifs]
[<0>] compound_send_recv+0x5bd/0x11b0 [cifs]
[<0>] cifs_send_recv+0x22/0x40 [cifs]
[<0>] SMB2_tdis+0x18a/0x250 [cifs]
[<0>] cifs_put_tcon.part.0+0xc0/0x260 [cifs]
[<0>] cifs_put_tlink+0x4a/0x90 [cifs]
[<0>] cifs_umount+0x59/0xc0 [cifs]
[<0>] cifs_kill_sb+0xae/0xd0 [cifs]
[<0>] deactivate_locked_super+0x32/0xb0
[<0>] deactivate_super+0x4a/0x60
[<0>] cleanup_mnt+0x150/0x1e0
[<0>] __cleanup_mnt+0x12/0x20
[<0>] task_work_run+0x61/0xa0
[<0>] exit_to_user_mode_loop+0x111/0x140
[<0>] exit_to_user_mode_prepare+0xb9/0xd0
[<0>] syscall_exit_to_user_mode+0x2a/0x50
[<0>] do_syscall_64+0x69/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
That's exactly what I've suspected. We have __cleanup_mnt
on the callstack which is executed from a context of a task work
before exiting from a syscall to a user space (link https://github.com/torvalds/linux/blob/fdf0eaf11452d72945af31804e2a1048ee1b574c/fs/namespace.c#L1332). This happens from mntput_no_expire
function which is called from many places, but most important one is mntput
(link https://github.com/torvalds/linux/blob/fdf0eaf11452d72945af31804e2a1048ee1b574c/fs/namespace.c#L1343).
I think that this bug is more about LXC/Linux kernel rather than LXD. I'm ready to take a look on that closely but later because it can result in a lot of work to fix it properly.
It would be great if you share your reproducer and your test setup for CIFS. It will be useful for us in a near future.
@roosterfish @mihalicyn shall we move this to lxc repo and close this one then?
yes, I think it fits better to LXC issue tracker
Reproducer steps:
lxc launch ubuntu:jammy vm1 --vm
lxc exec vm1 apt install samba
/data
on vm1
and share it by appending to the samba config (/etc/samba/smb.conf
):[sambashare]
path = /data
read only = no
browsable = yes
lxc exec vm1 service smbd restart
lxc exec vm1 smbpasswd -a root
-> set a password for the root userlxc exec vm1 smbpasswd -e root
-> enable rootStart a new privileged container and mount in the samba share:
lxc launch ubuntu:jammy c1 -c security.privileged=true
lxc exec c1 apt install cifs-utils
lxc exec c1 mount -t cifs //{IP of vm1}/sambashare /mnt
-> enter password from beforelxc delete -f c1
-> boom!Fantastic! Thanks a lot, Julian!
@roosterfish please can you post this over at https://github.com/lxc/lxc/issues and link to it here when you're done.
Required information
Issue description
When mounting a nfs share inside a container, the container deletion times out. Steps to reproduce:
Steps to reproduce
Will result in:
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)