Open Qubitium opened 1 month ago
I have upgraded the lxd on hosts to 6.1/stable to see if the new major revision mitigates this issue.
I tried reproducing this on 5.21/edge
and latest/edge
to no avail. Here's what I did:
$ lxc launch ubuntu-minimal-daily:24.04 c1
$ while :; do lxc exec c1 -- cat /proc/cpuinfo > /dev/null; done
It didn't result in a Transport endpoint is not connected
error after several minutes.
This sounds like a fuse/lxcfs issue at first glance but I wonder if the attached GPU or the ES CPU could have anything to do with it.
@Qubitium it's a long shot but I see many fuse related changes in the next kernel point release: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.10.10 so you might want to consider upgrading to the latest 6.10.x release.
@simondeziel I will upgrade to 6.10.10 to see if the problem persists.
To add more info:
/proc/cpuinfo
had no issue. When the container lost access to /proc/cpuinfo, the host was able to access it just fine. /proc/cpuinfo
happened randomly over a period of time post container startup. So after container startup, cpuinfo access was fine, but after a random extended period of time, the container lost access. Very strange.
@mihalicyn could this be related to the LXCFS fixes you're working on?
Reproduced on kernel 6.10.12-x64v4-xanmod1:
Host: Ubuntu 24.04, Kernel 6.10.12, snap lxd 6.1/stable Container: Centos 7.9.2009
ls -l /proc/*
ls: cannot access /proc/cpuinfo: Transport endpoint is not connected
ls: cannot access /proc/diskstats: Transport endpoint is not connected
ls: cannot access /proc/loadavg: Transport endpoint is not connected
ls: cannot access /proc/meminfo: Transport endpoint is not connected
ls: cannot access /proc/slabinfo: Transport endpoint is not connected
ls: cannot access /proc/stat: Transport endpoint is not connected
ls: cannot access /proc/swaps: Transport endpoint is not connected
ls: cannot access /proc/uptime: Transport endpoint is not connected
This server/host was rebooted yesterday so it happened within 24 hours. Again, it's quite random when/which container it happens to.
I checked syslog
and dmesg
on host and there is nothing there that would signal host had any errors of any kind.
The host has GPU passed to a separate container.
EDIT: ALL containers on this host lost access relevant /proc/* entries. Not just this container. I checked all containers, about 8-10 and they all have broken /proc/cpuinfo and related access.
@simondeziel @tomponline @mihalicyn
Found the cause! This is very good news. The lxd daemon had an internal crash related to liblxcfs
. Found it after executing sudo snap logs lxd.daemon -n 100
2024-10-03T21:11:01Z lxd.daemon[2267]: *** signal 11
2024-10-03T21:11:01Z lxd.daemon[2267]: Register dump:
2024-10-03T21:11:01Z lxd.daemon[2267]: RAX: 0000729991eb0df3 RBX: ffffffffffffff80 RCX: 0000729ed0f0f769
2024-10-03T21:11:01Z lxd.daemon[2267]: RDX: 0000729ec95ffc90 RSI: 0000729991eb0de3 RDI: 0000729991eb0df3
2024-10-03T21:11:01Z lxd.daemon[2267]: RBP: 0000729ec95ffb20 R8 : 0000729ed0ef4198 R9 : 000000000000000f
2024-10-03T21:11:01Z lxd.daemon[2267]: R10: 0000729ed0ef4198 R11: 0000000000000000 R12: 0000000000000003
2024-10-03T21:11:01Z lxd.daemon[2267]: R13: 0000729eb80389a0 R14: 00005f4a1d393dc8 R15: 0000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: RSP: 0000729ec95ffad0
2024-10-03T21:11:01Z lxd.daemon[2267]: RIP: 0000729ed0ca53fe EFLAGS: 00010202
2024-10-03T21:11:01Z lxd.daemon[2267]: CS: 0033 FS: 0000 GS: 0000
2024-10-03T21:11:01Z lxd.daemon[2267]: Trap: 0000000e Error: 00000004 OldMask: 00004007 CR2: 91eb0deb
2024-10-03T21:11:01Z lxd.daemon[2267]: FPUCW: 0000037f FPUSW: 00000000 TAG: 00000000
2024-10-03T21:11:01Z lxd.daemon[2267]: RIP: 00000000 RDP: 00000000
2024-10-03T21:11:01Z lxd.daemon[2267]: ST(0) 0000 0000000000000000 ST(1) 0000 0000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: ST(2) 0000 0000000000000000 ST(3) 0000 0000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: ST(4) 0000 0000000000000000 ST(5) 0000 0000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: ST(6) 0000 0000000000000000 ST(7) 0000 0000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: mxcsr: 1fa0
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM0: 00000000000000000000000000000000 XMM1: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM2: 00000000000000000000000000000000 XMM3: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM4: 00000000000000000000000000000000 XMM5: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM6: 00000000000000000000000000000000 XMM7: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM8: 00000000000000000000000000000000 XMM9: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM10: 00000000000000000000000000000000 XMM11: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM12: 00000000000000000000000000000000 XMM13: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: XMM14: 00000000000000000000000000000000 XMM15: 00000000000000000000000000000000
2024-10-03T21:11:01Z lxd.daemon[2267]: Backtrace:
2024-10-03T21:11:01Z lxd.daemon[2267]: /lib/x86_64-linux-gnu/libc.so.6(free+0x1e)[0x729ed0ca53fe]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/liblxcfs.so(do_release_file_info+0x42)[0x729ed0f1a0c8]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/liblxcfs.so(proc_release+0x20)[0x729ed0f0f789]
2024-10-03T21:11:01Z lxd.daemon[2267]: lxcfs(+0x2f2c)[0x5f49e2624f2c]
2024-10-03T21:11:01Z lxd.daemon[2267]: lxcfs(+0x3c7f)[0x5f49e2625c7f]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/x86_64-linux-gnu/libfuse3.so.3(+0xbb6a)[0x729ed0f64b6a]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/x86_64-linux-gnu/libfuse3.so.3(+0x104fa)[0x729ed0f694fa]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/x86_64-linux-gnu/libfuse3.so.3(+0x11738)[0x729ed0f6a738]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/x86_64-linux-gnu/libfuse3.so.3(+0x1e13f)[0x729ed0f7713f]
2024-10-03T21:11:01Z lxd.daemon[2267]: /snap/lxd/current/lib/x86_64-linux-gnu/libfuse3.so.3(+0x167a7)[0x729ed0f6f7a7]
2024-10-03T21:11:01Z lxd.daemon[2267]: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x729ed0c94ac3]
2024-10-03T21:11:01Z lxd.daemon[2267]: /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x729ed0d26850]
2024-10-03T21:11:01Z lxd.daemon[2267]: Memory map:
2024-10-03T21:11:01Z lxd.daemon[2267]: 5f49e2622000-5f49e262a000 r-xp 00000000 07:05 45 /snap/lxd/30130/bin/lxcfs
2024-10-03T21:11:01Z lxd.daemon[2267]: 5f49e262a000-5f49e262b000 r--p 00007000 07:05 45 /snap/lxd/30130/bin/lxcfs
2024-10-03T21:11:01Z lxd.daemon[2267]: 5f49e262b000-5f49e262c000 rw-p 00008000 07:05 45 /snap/lxd/30130/bin/lxcfs
2024-10-03T21:11:01Z lxd.daemon[2267]: 5f4a1d387000-5f4a1d406000 rw-p 00000000 00:00 0 [heap]
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e5c000000-729e5c104000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e5c104000-729e60000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e64000000-729e6407f000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e6407f000-729e68000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e68000000-729e68021000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e68021000-729e6c000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e70000000-729e7006c000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e7006c000-729e74000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e74000000-729e74021000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e74021000-729e78000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e7c000000-729e7c047000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e7c047000-729e80000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e80000000-729e80021000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e80021000-729e84000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e88000000-729e88160000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e88160000-729e8c000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e8c000000-729e8c021000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e8c021000-729e90000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e94000000-729e94021000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e94021000-729e98000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e98000000-729e98080000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729e98080000-729e9c000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ea0000000-729ea007e000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ea007e000-729ea4000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ea4000000-729ea418f000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ea418f000-729ea8000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eac000000-729eac175000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eac175000-729eb0000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb0000000-729eb0068000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb0068000-729eb4000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb4400000-729eb4401000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb4401000-729eb4c01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb4e00000-729eb4e01000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb4e01000-729eb5601000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb6200000-729eb6201000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb6201000-729eb6a01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb7600000-729eb7601000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb7601000-729eb7e01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb8000000-729eb8092000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eb8092000-729ebc000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ebc000000-729ebc06a000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ebc06a000-729ec0000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec02fe000-729ec0400000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec0e00000-729ec0e01000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec0e01000-729ec1601000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec16fe000-729ec1800000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec2200000-729ec2201000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec2201000-729ec2a01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec2c00000-729ec2c01000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec2c01000-729ec3401000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec3600000-729ec3601000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec3601000-729ec3e01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec4000000-729ec4180000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec4180000-729ec8000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec82fe000-729ec8400000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec8400000-729ec8401000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec8401000-729ec8c01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec8cfe000-729ec8e00000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec8e00000-729ec8e01000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec8e01000-729ec9601000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec9800000-729ec9801000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ec9801000-729eca001000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eca200000-729eca201000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729eca201000-729ecaa01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecaafe000-729ecac00000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecac00000-729ecac01000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecac01000-729ecb401000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecb4fe000-729ecb600000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecb600000-729ecb601000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecb601000-729ecbe01000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecc000000-729ecc026000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ecc026000-729ed0000000 ---p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0b21000-729ed0c00000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0c00000-729ed0c28000 r--p 00000000 07:01 7474 /usr/lib/x86_64-linux-gnu/libc.so.6
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0c28000-729ed0dbd000 r-xp 00028000 07:01 7474 /usr/lib/x86_64-linux-gnu/libc.so.6
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0dbd000-729ed0e15000 r--p 001bd000 07:01 7474 /usr/lib/x86_64-linux-gnu/libc.so.6
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0e15000-729ed0e16000 ---p 00215000 07:01 7474 /usr/lib/x86_64-linux-gnu/libc.so.6
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0e16000-729ed0e1a000 r--p 00215000 07:01 7474 /usr/lib/x86_64-linux-gnu/libc.so.6
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0e1a000-729ed0e1c000 rw-p 00219000 07:01 7474 /usr/lib/x86_64-linux-gnu/libc.so.6
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0e1c000-729ed0e29000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0e29000-729ed0ef3000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0ef3000-729ed0f26000 r-xp 00000000 07:05 158 /snap/lxd/30130/lib/liblxcfs.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f26000-729ed0f27000 r--p 00032000 07:05 158 /snap/lxd/30130/lib/liblxcfs.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f27000-729ed0f28000 rw-p 00033000 07:05 158 /snap/lxd/30130/lib/liblxcfs.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f28000-729ed0f36000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f36000-729ed0f39000 r--p 00000000 07:01 7524 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f39000-729ed0f50000 r-xp 00003000 07:01 7524 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f50000-729ed0f54000 r--p 0001a000 07:01 7524 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f54000-729ed0f55000 r--p 0001d000 07:01 7524 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f55000-729ed0f56000 rw-p 0001e000 07:01 7524 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f56000-729ed0f59000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f59000-729ed0f60000 r--p 00000000 07:05 568 /snap/lxd/30130/lib/x86_64-linux-gnu/libfuse3.so.3.10.5
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f60000-729ed0f7b000 r-xp 00007000 07:05 568 /snap/lxd/30130/lib/x86_64-linux-gnu/libfuse3.so.3.10.5
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f7b000-729ed0f85000 r--p 00022000 07:05 568 /snap/lxd/30130/lib/x86_64-linux-gnu/libfuse3.so.3.10.5
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f85000-729ed0f97000 r--p 0002b000 07:05 568 /snap/lxd/30130/lib/x86_64-linux-gnu/libfuse3.so.3.10.5
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f97000-729ed0f98000 rw-p 0003d000 07:05 568 /snap/lxd/30130/lib/x86_64-linux-gnu/libfuse3.so.3.10.5
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f98000-729ed0f99000 r--p 00000000 07:05 541 /snap/lxd/30130/lib/x86_64-linux-gnu/libSegFault.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f99000-729ed0f9c000 r-xp 00001000 07:05 541 /snap/lxd/30130/lib/x86_64-linux-gnu/libSegFault.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f9c000-729ed0f9d000 r--p 00004000 07:05 541 /snap/lxd/30130/lib/x86_64-linux-gnu/libSegFault.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f9d000-729ed0f9e000 r--p 00005000 07:05 541 /snap/lxd/30130/lib/x86_64-linux-gnu/libSegFault.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f9e000-729ed0f9f000 rw-p 00006000 07:05 541 /snap/lxd/30130/lib/x86_64-linux-gnu/libSegFault.so
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0f9f000-729ed0fa1000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fa1000-729ed0fa5000 r--p 00000000 00:00 0 [vvar]
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fa5000-729ed0fa7000 r-xp 00000000 00:00 0 [vdso]
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fa7000-729ed0fa9000 r--p 00000000 07:01 7445 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fa9000-729ed0fd3000 r-xp 00002000 07:01 7445 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fd3000-729ed0fde000 r--p 0002c000 07:01 7445 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fde000-729ed0fdf000 rw-p 00000000 00:00 0
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fdf000-729ed0fe1000 r--p 00037000 07:01 7445 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
2024-10-03T21:11:01Z lxd.daemon[2267]: 729ed0fe1000-729ed0fe3000 rw-p 00039000 07:01 7445 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
2024-10-03T21:11:01Z lxd.daemon[2267]: 7ffd23f81000-7ffd23fa2000 rw-p 00000000 00:00 0 [stack]
2024-10-03T21:11:01Z lxd.daemon[2267]: ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
EDIT: looks like an attempt to free an invalid pointer in liblxcfs.so (do_release_file_info)
@Qubitium thanks that's really useful. I've punted the issue to Aleks whose one of the lxcfs
maintainer.
@simondeziel Should I track this here or will there there be a second issue on github/lxcfs?
Indeed, that might require a bug in the lxcfs
project but I'll let Aleks comment as he might want some more information from you anyway. Thanks
Hi @Qubitium,
Thanks a lot for reporting this issue to us!
Maybe my question sounds unrelated but, are you using ZFS? If yes, then your case looks similar to https://github.com/lxc/lxcfs/issues/644 I've spent really a lot of time rechecking LXCFS code and analyzing coredumps and all points out that ZFS kernel driver corrupts kernel memory on all kernels starting from 6.8.
@mihalicyn I am using zfs but I did not get the same kernel crashes as posted in the zfs github issue.
But... I found your comment in that issue thread and I am doing hourly flushes of cache buffers exactly as you do not recommend. Oof. 😢
Can you explain why this level 3 flush is dangerous? I am using it so ext4 and zfs buffers are flushed on host on a regular basis so that containers dont oom due to memory allocations.
I am using zfs but I did not get the same kernel crashes as posted in the zfs github issue.
yeah, that issue reported also have not experiencing crashes until some point and they were relatively rare. But once he get one, it was a clear evidence of a serious issue with ZFS kernel driver.
I'm not saying that your issue is 100% sure the same as that from other report. But looks too similar:
What I would suggest you to do, is to install older kernel like 6.5 and check if issue is still there or not. And if not - then you have at least workaround until all the problems with ZFS will be solved.
Now the question is how to install 6.5 kernel on Noble, taking into account that Ubuntu Noble has 6.8 as a base kernel. I would try to download 6.6.x kernel from https://kernel.ubuntu.com/mainline/v6.6.51/ (6.6 choice is not random, it's an official upstream LTS kernel. see https://kernel.org/)
Can you explain why this level 3 flush is dangerous? I am using it so ext4 and zfs buffers are flushed on host on a regular basis so that containers dont oom due to memory allocations.
These flushes are not dangerous if they are done on a non-buggy kernel. But I had a hint that something is wrong with ZFS ARC cache and forcing drop caches may trigger a buggy codepath in the kernel and stimulate a kernel crash (and this is good for debugging, but can have really bad consequences when you run it on a production system and cause data loss or corruption on your disk).
@mihalicyn Thank you for the deep dive. Looks like I hit a rabbit hole that may not be solvable in the near term unless someone can reproduce it in a non-random workload.
I will definitely test downgrade kernel to 6.6 and report back on stability.
Hey @Qubitium,
if you are still on a Ubuntu Noble's default kernel you can also try to enable KFENCE detector as it may (if we are lucky enough) help to identify the issue and help with fixing it in the future.
As a root user:
echo 100 > /sys/module/kfence/parameters/sample_interval
echo 75 > /sys/module/kfence/parameters/skip_covered_thresh
or even better (but will take more CPU resources):
echo 1 > /sys/module/kfence/parameters/sample_interval
echo 10 > /sys/module/kfence/parameters/skip_covered_thresh
It is relatively safe and designed for debugging in production environments.
After enabing this you need to watch after your dmesg
for error messages like these.
Upd: you may consider this https://gist.github.com/melver/7bf5bdfa9a84c52225b8313cbd7dc1f9 script too.
Upd 2:
You can also enable SLUB debugging by editing /etc/default/grub
:
GRUB_CMDLINE_LINUX_DEFAULT="slub_debug=FZPU"
and then update-grub
and reboot
.
@mihalicyn Thanks for the tips. Can I combine with KFENCE with SLUB debug? Can they coexist peacefully?
@mihalicyn Thanks for the tips. Can I combine with KFENCE with SLUB debug? Can they coexist peacefully?
yes, absolutely!
Required information
Distribution: Ubuntu
Distribution version: 24.04.1
The output of "snap list --all lxd core20 core22 core24 snapd":
The output of "lxc info" or if that fails:
Issue description
lxd container (both host and container are ubuntu 24.04.1) randomly drops
/proc/cpuinfo
?I have no idea why this is happening. Force stop container and start container will fix this issue, until it happens next time. The chance of it happening is about once per week.
Want to add this server/container has a single Nvidia 4070 GPU passed to via
device=gpu type=gpu
.Steps to reproduce
Happened more than once randomly on different amd single socket servers EPCY 9004 32-core/64 threads. (cpu has no official model id: engineering sample)
Information to attach
Correct cpuinfo:
lxc config show c1 (the container)