dentproject / dentOS

dentOS SwitchDev based NOS
Other
200 stars 59 forks source link

BUG: Bad page state in process #155

Open daniellerts opened 2 years ago

daniellerts commented 2 years ago

we are getting a lot of BUGs in dmesg such as below:

[ 26.301368] BUG: Bad page state in process swapper/0 pfn:104300
[ 26.307505] page:00000000ce81ba6f refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104300
[ 26.317045] head:00000000ce81ba6f order:8 compound_mapcount:0 compound_pincount:0
[ 26.324622] flags: 0x2ffff00000010200(slab|head|node=0|zone=2|lastcpupid=0xffff)
[ 26.332119] raw: 2ffff00000010200 fffffc0004108008 fffffc0004110008 ffff000100000c00
[ 26.339953] raw: 0000000000000000 ffff000104300000 0000000000000001 0000000000000000
[ 26.347777] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 26.354288] Modules linked in:
[ 26.357402] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-75757-g09fce596f9cb #468
[ 26.365298] Hardware name: Marvell Armada 7040 TX4810 (DT)
[ 26.370825] Call trace:
[ 26.373292] dump_backtrace+0x0/0x2ec
[ 26.376998] show_stack+0x24/0x30
[ 26.380346] dump_stack_lvl+0x68/0x84
[ 26.384047] dump_stack+0x1c/0x38
[ 26.387394] bad_page+0x12c/0x170
[ 26.390746] __free_pages_ok+0x63c/0x7b0
[ 26.394705] page_frag_free+0xcc/0xe0
[ 26.398402] skb_free_head+0x44/0xa0
[ 26.402017] skb_release_data+0x1d0/0x274
[ 26.406065] kfree_skb.part.0+0x6c/0x100
[ 26.410026] kfree_skb+0x54/0xc0
[ 26.413288] icmpv6_rcv+0x140/0x984
[ 26.416817] ip6_protocol_deliver_rcu+0x198/0x894
[ 26.421566] ip6_input+0x140/0x150
[ 26.425000] ip6_mc_input+0x228/0x550
[ 26.428696] ip6_sublist_rcv_finish+0x9c/0xd0
[ 26.433091] ip6_sublist_rcv+0x344/0x440
[ 26.437049] ipv6_list_rcv+0x1c0/0x220
[ 26.440833] __netif_receive_skb_list_core+0x2b0/0x3b0
[ 26.446020] netif_receive_skb_list_internal+0x29c/0x474
[ 26.451378] napi_complete_done+0xc4/0x2c0
[ 26.455513] mvpp2_poll+0x20c/0x310
[ 26.459040] __napi_poll+0x64/0x280
[ 26.462561] net_rx_action+0x4c4/0x550
[ 26.466345] __do_softirq+0x1a0/0x544
[ 26.470043] __irq_exit_rcu+0x164/0x184
[ 26.473920] irq_exit_rcu+0x1c/0x30
[ 26.477445] el1_interrupt+0x38/0x54
[ 26.481057] el1h_64_irq_handler+0x18/0x24
[ 26.485191] el1h_64_irq+0x78/0x7c
[ 26.488625] arch_local_irq_enable+0xc/0x20
[ 26.492851] default_idle_call+0x58/0x1c0
[ 26.496897] do_idle+0x2f8/0x380
[ 26.500161] cpu_startup_entry+0x34/0x8c
[ 26.504121] rest_init+0xf8/0x110
[ 26.507470] arch_call_rest_init+0x1c/0x28
[ 26.511613] start_kernel+0x3a4/0x3dc
[ 26.515313] __primary_switched+0xc4/0xcc

Each time a different process fails.

Reproduce:

  1. On a builder, compile our kernel: $ git clone git://github.com/jpirko/linux_mlxsw.git -b combined_queue $ cd ~/linux_mlxsw && make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- olddefconfig && make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- mod2yesconfig && make -jnproc ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
  2. On the switch, enter the same directory you have compiled the kernel and run the below with the suitable kernel cmdline: $ cd ~/linux_mlxsw && kexec -l arch/arm64/boot/Image --append="onl_platform=arm64-delta-tx4810-r0 root=/dev/sda4 rw pci=pcie_bus_safe =no" && kexec -e &
  3. Connect the switch again and run: $ dmesg | grep BUG

Can you help with that please? Thanks.