Open redbaron opened 6 years ago
I just noticed, that all mounts are from iSCSI session, which kubelet logs out from when last PV is removed. Probably it does it too quick, so systemd has no chance to settle, so probably it is entirely different bug, from what I initially thought.
It looks like that patch is in v237 which is in 1688.2.0, so I'd agree its probably a different bug. I assume the directory it's trying to mount to does in fact exist?
Something is definitely happen on iSCSI logout :( Tried multiple times, sometimes it ends up in following error in dmesg:
Mar 08 13:22:59 erepnk12 kernel: INFO: task systemd-udevd:7291 blocked for more than 120 seconds.
Mar 08 13:22:59 erepnk12 kernel: Not tainted 4.14.24-coreos #1
Mar 08 13:22:59 erepnk12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 08 13:22:59 erepnk12 kernel: systemd-udevd D 0 7291 819 0x80000186
Mar 08 13:22:59 erepnk12 kernel: Call Trace:
Mar 08 13:22:59 erepnk12 kernel: ? __schedule+0x294/0x8a0
Mar 08 13:22:59 erepnk12 kernel: schedule+0x28/0x80
Mar 08 13:22:59 erepnk12 kernel: io_schedule+0x12/0x40
Mar 08 13:22:59 erepnk12 kernel: __lock_page+0x103/0x140
Mar 08 13:22:59 erepnk12 kernel: ? page_cache_tree_insert+0xc0/0xc0
Mar 08 13:22:59 erepnk12 kernel: truncate_inode_pages_range+0x54c/0x7b0
Mar 08 13:22:59 erepnk12 kernel: ? smp_call_function_many+0xa4/0x250
Mar 08 13:22:59 erepnk12 kernel: ? on_each_cpu_mask+0x23/0x60
Mar 08 13:22:59 erepnk12 kernel: ? __brelse+0x20/0x20
Mar 08 13:22:59 erepnk12 kernel: ? on_each_cpu_mask+0x23/0x60
Mar 08 13:22:59 erepnk12 kernel: ? on_each_cpu_cond+0xa0/0xd0
Mar 08 13:22:59 erepnk12 kernel: __blkdev_put+0x71/0x1f0
Mar 08 13:22:59 erepnk12 kernel: blkdev_close+0x21/0x30
Mar 08 13:22:59 erepnk12 kernel: __fput+0xd8/0x220
Mar 08 13:22:59 erepnk12 kernel: task_work_run+0x8a/0xb0
Mar 08 13:22:59 erepnk12 kernel: do_exit+0x2e3/0xaf0
Mar 08 13:22:59 erepnk12 kernel: ? touch_atime+0xc8/0xe0
Mar 08 13:22:59 erepnk12 kernel: do_group_exit+0x3a/0xa0
Mar 08 13:22:59 erepnk12 kernel: get_signal+0x269/0x570
Mar 08 13:22:59 erepnk12 kernel: do_signal+0x36/0x610
Mar 08 13:22:59 erepnk12 kernel: ? __vfs_read+0xfe/0x150
Mar 08 13:22:59 erepnk12 kernel: ? __audit_syscall_exit+0x230/0x2b0
Mar 08 13:22:59 erepnk12 kernel: exit_to_usermode_loop+0x69/0xa0
Mar 08 13:22:59 erepnk12 kernel: do_syscall_64+0x104/0x120
Mar 08 13:22:59 erepnk12 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Issue Report
Bug
When stress testing our iSCSI setup I created tens of pods with dynamically provisioned PVs and removed them in a loop. Systemd transient mount errors start to accumulate:
Container Linux Version
Environment
Baremetal + NetApp ONTAP over iSCSI for PersistentVolumes
Other Information
Looks like https://github.com/kubernetes/kubernetes/issues/57345 which points to https://github.com/systemd/systemd/issues/7798 , maybe consider backporting the patch?