Closed TheCrazyLex closed 7 years ago
@naixia @dolohow please help :)
@TheCrazyLex
Ok, I'll look into this.
Have you ever tried another kernel version like v4.6 or v4.5 and seen the same bug?
I suspect it is related to some upstream kernel change. Can you help me to narrow it down?
@TheCrazyLex
It's weird that this code path is actually very hot for every uksm user. I think the logic itself should have been tested by many many users during last several years.
So I want to know if you are running under special cases like OOM? or caused by ill RAM banks ? or other strange setup or system config related to memory?
@naixia Thank you for your very fast answer! I'll be glad to help narrowing it down with you.
I didn't try any other kernel versions, I might try 4.8 later.
I wouldn't think my setup is special actually, 8GB RAM and 5,2GB swap. As far as I saw i wasn't near an OOM when this happened. UKSM helps me a lot during Android compilation, it seems there are a lot of duplicated pages produced.
I ran some tests on my RAM banks today, they seem to be healthy. And the system runs normally when I disable UKSM, just that the memory usage is pretty high then.
Thank you!
@naixia Please let me know whether you think it is worth testing the 4.8 patch on 4.8.1 :)
@TheCrazyLex I would suggest you to start from v4.4 trying to find a version number (e.g. N ) that does not trigger the bug and its next version (N+1) will trigger. And if you failed to find a good kernel for your workload, I would suggest you to create a lxc rootfs or docker image of your build system for me so that I can reproduce the bug on my machine.
I think I hit similar bug:
[ 8214.542310] ------------[ cut here ]------------
[ 8214.542315] WARNING: CPU: 1 PID: 70 at mm/page_alloc.c:3430 __alloc_pages_nodemask+0xc2e/0xda0
[ 8214.542316] Modules linked in: overlay ctr ccm arc4 xt_conntrack iptable_filter iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse amdkfd amd_iommu_v2 mousedev joydev input_leds radeon iTCO_wdt iTCO_vendor_support ppdev intel_rapl evdev x86_pkg_temp_thermal intel_powerclamp coretemp ath5k kvm_intel led_class mac80211 snd_hda_codec_realtek snd_hda_codec_generic i2c_algo_bit kvm drm_kms_helper snd_hda_codec_hdmi irqbypass syscopyarea ath snd_hda_intel sysfillrect sysimgblt cfg80211 snd_hda_codec fb_sys_fops snd_hwdep ttm psmouse snd_hda_core pcspkr drm rfkill snd_pcm snd_timer r8169 snd mei_me mei soundcore mii i2c_i801 i2c_smbus shpchp lpc_ich thermal fan battery parport_pc parport video button sch_fq_codel ip_tables
[ 8214.542345] x_tables ext4 crc16 jbd2 mbcache hid_generic usbhid hid sd_mod serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ahci aesni_intel aes_x86_64 libahci glue_helper lrw libata gf128mul ablk_helper cryptd xhci_pci scsi_mod ehci_pci xhci_hcd ehci_hcd usbcore usb_common i8042 serio jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng
[ 8214.542360] CPU: 1 PID: 70 Comm: uksmd Not tainted 4.8.4-ck3-bfq-uksm #4
[ 8214.542361] Hardware name: MSI MS-7820/H81-P33(MS-7820), BIOS V1.6 03/30/2015
[ 8214.542362] 0000000000000286 00000000f6319a93 ffff88019440f930 ffffffff812d2160
[ 8214.542363] 0000000000000000 0000000000000000 ffff88019440f970 ffffffff81075b3b
[ 8214.542365] 00000d669440fa50 ffff880197279880 0000000000000000 0000000000000000
[ 8214.542366] Call Trace:
[ 8214.542370] [<ffffffff812d2160>] dump_stack+0x63/0x83
[ 8214.542373] [<ffffffff81075b3b>] __warn+0xcb/0xf0
[ 8214.542374] [<ffffffff81075c6d>] warn_slowpath_null+0x1d/0x20
[ 8214.542376] [<ffffffff8115452e>] __alloc_pages_nodemask+0xc2e/0xda0
[ 8214.542377] [<ffffffff81056ab2>] ? __x2apic_send_IPI_dest+0x32/0x40
[ 8214.542378] [<ffffffff8104e08b>] ? native_send_call_func_single_ipi+0x1b/0x20
[ 8214.542381] [<ffffffff810d7d59>] ? generic_exec_single+0x79/0x120
[ 8214.542382] [<ffffffff81069410>] ? tlbflush_read_file+0x80/0x80
[ 8214.542384] [<ffffffff811b1145>] new_slab+0xa5/0x620
[ 8214.542385] [<ffffffff810d7edb>] ? smp_call_function_single+0xdb/0x150
[ 8214.542386] [<ffffffff81069410>] ? tlbflush_read_file+0x80/0x80
[ 8214.542387] [<ffffffff811b37b9>] ___slab_alloc.constprop.28+0x2e9/0x3c0
[ 8214.542388] [<ffffffff811a96e1>] ? cmp_and_merge_page+0x1431/0x2860
[ 8214.542389] [<ffffffff81069cdc>] ? flush_tlb_page+0x5c/0xb0
[ 8214.542391] [<ffffffff8116ed35>] ? __dec_node_page_state+0x15/0x20
[ 8214.542392] [<ffffffff811a96e1>] ? cmp_and_merge_page+0x1431/0x2860
[ 8214.542393] [<ffffffff811b38bb>] __slab_alloc.isra.22.constprop.27+0x2b/0x40
[ 8214.542394] [<ffffffff811b3a2e>] kmem_cache_alloc+0x15e/0x1a0
[ 8214.542395] [<ffffffff811a96a4>] ? cmp_and_merge_page+0x13f4/0x2860
[ 8214.542396] [<ffffffff811a96e1>] cmp_and_merge_page+0x1431/0x2860
[ 8214.542397] [<ffffffff811ab116>] scan_vma_one_page+0x606/0x15d0
[ 8214.542398] [<ffffffff81037989>] ? sched_clock+0x9/0x10
[ 8214.542399] [<ffffffff811ac241>] uksm_do_scan+0x161/0x2c40
[ 8214.542401] [<ffffffff810c1ae8>] ? del_timer_sync+0x48/0x50
[ 8214.542403] [<ffffffff815a8597>] ? schedule_timeout+0x237/0x420
[ 8214.542404] [<ffffffff811aee74>] uksm_scan_thread+0x154/0x180
[ 8214.542405] [<ffffffff811aed20>] ? uksm_do_scan+0x2c40/0x2c40
[ 8214.542406] [<ffffffff811aed20>] ? uksm_do_scan+0x2c40/0x2c40
[ 8214.542408] [<ffffffff81094af8>] kthread+0xd8/0xf0
[ 8214.542410] [<ffffffff8109d768>] ? finish_task_switch+0x88/0x330
[ 8214.542411] [<ffffffff815a987f>] ret_from_fork+0x1f/0x40
[ 8214.542412] [<ffffffff81094a20>] ? kthread_worker_fn+0x170/0x170
[ 8214.542413] ---[ end trace 5c744655a6541f9f ]---
@dolohow It's not the same bug as @TheCrazyLex encountered. It's an abuse of kmem_cache_alloc() GFP flag warning. It's easy to fix. I'll update the patch for v4.8 soon.
@dolohow I'v updated the patch. You may have a try and see if it's fixed.
Thanks, I will test it and I will provide you with a feedback.
@dolohow Have you hit the kernel warning again?
Not yet, hopefully it won't show up.
@TheCrazyLex any updates?
I got a little distracted by other things. I'll retry soon
Am 31.12.2016 2:06 vorm. schrieb "naixia" notifications@github.com:
@TheCrazyLex https://github.com/TheCrazyLex any updates?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dolohow/uksm/issues/3#issuecomment-269840854, or mute the thread https://github.com/notifications/unsubscribe-auth/AICdm2gUsjrcAPwfBw-qVamPZDUE4Sdeks5rNaqQgaJpZM4KS1c6 .
Closed due to no feedbacks for a long time and I cannot reproduce it.
I am using the UKSM patch for Kernel 4.7 and applying it on a clean 4.7.7 base. The patch applies cleanly.
The problem is that uksm crashes pretty often for me, since a "BUG_ON" in the code gets triggered. This happens mostly during compiling Android and aborts the compilation with a "memory allocation" failure.
The output in dmesg is as follows:
Thanks in advance!