Closed colo-ft closed 7 years ago
I forgot to mention that, the reason why other cpus are all running into stopping status , is we have executed 'rmmod modules' command, which will force all cpus go into stopping.
So as we analysed above if some process hold the write sem lock has been scheduled out because of the rmmod process. uksm will never get the read sem lock of mm_struct, and it may got stuck in the while() loop.
There are several conditions to break the two while loops. A simple and direct hack is to always break the outer loop whenever busy_retry drops to zero so that uksmd is less aggressive racing with busy mms.
@naixia Yes, that is one way to fix this problem, actually, we fix it by checking if or not all the vm_slots in this rung are busy to decide whether or not to break the outer loop ~
Thanks
Will be fixed in the next release
Hi @naixia & @dolohow,
Sorry to bother both of you again. :)
We seemed to catch a new bug in uksm, there seems to be an endless loop in uksm which caught soft lockup, Here is the backtrace we got from crash core:
crash> bt -a PID: 8 TASK: ffff881029455080 CPU: 0 COMMAND: "migration/0"
0 [ffff88103fe03f70] stop_this_cpu at ffffffff8101e70d
1 [ffff88103fe03f80] smp_reboot_interrupt at ffffffff81046e40
2 [ffff88103fe03fb0] reboot_interrupt at ffffffff81655d1d
--- ---
3 [ffff8810294f3cd8] reboot_interrupt at ffffffff81655d1d
4 [ffff8810294f3db0] cpu_stopper_thread at ffffffff81106086
5 [ffff8810294f3e80] smpboot_thread_fn at ffffffff810aedff
6 [ffff8810294f3ec8] kthread at ffffffff810a61ff
7 [ffff8810294f3f50] ret_from_fork at ffffffff81655118
PID: 61 TASK: ffff88102979c500 CPU: 1 COMMAND: "migration/1"
0 [ffff88103fe43f70] stop_this_cpu at ffffffff8101e70d
1 [ffff88103fe43f80] smp_reboot_interrupt at ffffffff81046e40
2 [ffff88103fe43fb0] reboot_interrupt at ffffffff81655d1d
--- ---
3 [ffff8810297c3cd8] reboot_interrupt at ffffffff81655d1d
4 [ffff8810297c3db0] cpu_stopper_thread at ffffffff81106086
5 [ffff8810297c3e80] smpboot_thread_fn at ffffffff810aedff
6 [ffff8810297c3ec8] kthread at ffffffff810a61ff
7 [ffff8810297c3f50] ret_from_fork at ffffffff81655118
... ...
PID: 236 TASK: ffff88102730ae00 CPU: 9 COMMAND: "uksmd"
0 [ffff88203e2c3f70] stop_this_cpu at ffffffff8101e70d
1 [ffff88203e2c3f80] smp_reboot_interrupt at ffffffff81046e40
2 [ffff88203e2c3fb0] reboot_interrupt at ffffffff81655d1d
--- ---
3 [ffff881026ce3ce8] reboot_interrupt at ffffffff81655d1d
4 [ffff881026ce3da0] try_down_read_slot_mmap_sem at ffffffff811bef59
5 [ffff881026ce3db8] uksm_do_scan at ffffffff811c6590
6 [ffff881026ce3e68] uksm_scan_thread at ffffffff811c7e75
7 [ffff881026ce3ec8] kthread at ffffffff810a61ff
8 [ffff881026ce3f50] ret_from_fork at ffffffff81655118
PID: 107 TASK: ffff882027fe0000 CPU: 10 COMMAND: "migration/10"
0 [ffff88203e303f70] stop_this_cpu at ffffffff8101e70d
1 [ffff88203e303f80] smp_reboot_interrupt at ffffffff81046e40
2 [ffff88203e303fb0] reboot_interrupt at ffffffff81655d1d
--- ---
3 [ffff882027febcd8] reboot_interrupt at ffffffff81655d1d
4 [ffff882027febdb0] cpu_stopper_thread at ffffffff81106086
5 [ffff882027febe80] smpboot_thread_fn at ffffffff810aedff
6 [ffff882027febec8] kthread at ffffffff810a61ff
7 [ffff882027febf50] ret_from_fork at ffffffff81655118
I only kept parts of backtrace here, other cpus except CPU 9 which seems to be loop in uksm_do_scan were offline.
In the function uksm_do_scan(), if try_down_read_slot_mmap_sem() always return -EBUSY for a period of time. It will goes into an endless loop during the time: ( Is this possbile ? ) while (rung->pages_to_scan &&rung->vma_root.num && likely(!freezing(current)) {
... ...
We analysed the core, and it was scanning the rung[2] area at that time, and there was 2 vma_slot in this rung. The mmap_sem.count of these two vma_slot's mm struct are -4294967295(0xFFFFFFFF00000001) and -4294967291 (0xFFFFFFFF00000005), which means we can't got the related read semaphore lock at that time, It means the return value of try_down_read_slot_mmap_sem() will be -EBUSY. Please see the bellow for details.
In our test, we enabled swap in the system, and in some special situation we may can not got the mm_sem read semaphore lock for a peroid of time.
I'm not quite sure if our anaylsis is right or not, Is it possbile to run endlessly in the while(rung->pages_to_scan &&rung->vma_root.num ? Any idea about this problem ?
Thanks.
crash> p uksm_scan_ladder[2] $6 = { $6 = { vma_root = { height = 1, rnode = 0xffff881fe2f6a8a8, enter_node = 0xffff881fe2f6a8a8, shift = 8, stores_size = 256, mask = 255, min = 2, num = 2, ... ... },
... ...
current_scan = 0xffff882007254cf0, current_offset = 241, offset_init = 0, step = 606, flags = 1, pages_to_scan = 3212, cpu_ratio = -2500, cover_msecs = 200 } crash> sradix_tree_node.stores struct sradix_tree_node { [24] void *stores[]; } crash> x/16g 0xffff881fe2f6a8a8+24 0xffff881fe2f6a8c0: 0xffff881fdd284170 0xffff882007254cf0 0xffff881fe2f6a8d0: 0x0000000000000000 0x0000000000000000 0xffff881fe2f6a8e0: 0x0000000000000000 0x0000000000000000 0xffff881fe2f6a8f0: 0x0000000000000000 0x0000000000000000 0xffff881fe2f6a900: 0x0000000000000000 0x0000000000000000 0xffff881fe2f6a910: 0x0000000000000000 0x0000000000000000 0xffff881fe2f6a920: 0x0000000000000000 0x0000000000000000 0xffff881fe2f6a930: 0x0000000000000000 0x0000000000000000 crash> vma_slot 0xffff881fdd284170 struct vma_slot { snode = 0xffff881fe2f6a8a8, sindex = 0, slot_list = { next = 0xffff881fdd284180, prev = 0xffff881fdd284180 }, fully_scanned_round = 576, dedup_num = 0, pages_scanned = 2635, last_scanned = 2635, pages_to_scan = 0, rung = 0xffffffff81efeb70 <uksm_scan_ladder+496>, rmap_list_pool = 0xffff8809ca928000, pool_counts = 0xffff880e88110000, pool_size = 4096, vma = 0xffff881e22dec948, mm = 0xffff881d1b6ef6c0, ctime_j = 4463343148, pages = 2097152, flags = 19, pages_cowed = 0, pages_merged = 0, pages_bemerged = 0, dedup_list = { next = 0xffff881fdd284218, prev = 0xffff881fdd284218 } }
crash> mm_struct 0xffff881d1b6ef6c0 struct mm_struct { mmap = 0xffff881de0020438, mm_rb = { rb_node = 0xffff8818321a2608 }, mmap_cache = 0xffff881d09571878, get_unmapped_area = 0xffffffff81019af0,
unmap_area = 0xffffffff811a0ae0 ,
mmap_base = 140047777320960,
mmap_legacy_base = 47584893067264,
task_size = 140737488351232,
cached_hole_size = 0,
free_area_cache = 140047777320960,
highest_vm_end = 140731066691584,
pgd = 0xffff881839c52000,
mm_users = {
counter = 60
},
mm_count = {
counter = 7
},
... ...
mmap_sem = { count = -4294967295, wait_lock = { raw_lock = { { head_tail = 163580352, tickets = { head = 2496, tail = 2496 } } } }, wait_list = { next = 0xffff880f51dcfe70, prev = 0xffff880045373dd0 } }, mmlist = { next = 0xffff881d1b6ef758, prev = 0xffff881d1b6ef758 }, ... ...
crash> vma_slot 0xffff882007254cf0 struct vma_slot { snode = 0xffff881fe2f6a8a8, sindex = 1, slot_list = { next = 0xffff882007254d00, prev = 0xffff882007254d00 }, fully_scanned_round = 0, dedup_num = 0, pages_scanned = 5465, last_scanned = 5465, pages_to_scan = 0, rung = 0xffffffff81efeb70 <uksm_scan_ladder+496>, rmap_list_pool = 0xffff881c37150000, pool_counts = 0xffff881ac8848000, pool_size = 8192, vma = 0xffff881ff509b290, mm = 0xffff88047d5cc4c0, ctime_j = 4463606846, pages = 4194304, flags = 19, pages_cowed = 0, pages_merged = 0, pages_bemerged = 0, dedup_list = { next = 0xffff882007254d98, prev = 0xffff882007254d98 } } crash> mm_struct 0xffff88047d5cc4c0 struct mm_struct { mmap = 0xffff88066954cf30, mm_rb = { rb_node = 0xffff8820030a51d8 }, mmap_cache = 0xffff881ff509b290, get_unmapped_area = 0xffffffff81019af0,
unmap_area = 0xffffffff811a0ae0 ,
mmap_base = 139674410536960,
mmap_legacy_base = 47958259851264,
task_size = 140737488351232,
cached_hole_size = 0,
free_area_cache = 139674410536960,
highest_vm_end = 140732348211200,
pgd = 0xffff882001615000,
mm_users = {
counter = 47
},
mm_count = {
counter = 8
},
... ...
mmap_sem = { count = -4294967291, wait_lock = { raw_lock = { { head_tail = 14155992, tickets = { head = 216, tail = 216 } } } }, wait_list = { next = 0xffff88033386fe70, prev = 0xffff880aca73baf0 } },
... ...
owner = 0xffff880cdcf4c500, exe_file = 0xffff880f187f4200,