Microchip-Ethernet / EVB-KSZ9477

Repository for using Microchip EVB-KSZ9477 board. Product Supported: KSZ9477, KSZ9567, KSZ9897, KSZ9896, KSZ8567, KSZ8565, KSZ9893, KSZ9563, KSZ8563, LAN9646, Phys(KSZ9031/9131, LAN8770
76 stars 79 forks source link

"BUG : Bad page state in process.... " using 2*ksz9477 on imx7d SOM, kernel 4.14.78 #70

Open jeghub opened 3 years ago

jeghub commented 3 years ago

Hi,

I don't now if my problem is related to the KSZ driver implementation for FEC driver, but I'm locked on this issue from months, so I try a post here in case someone face the same issue and maybe find a solution.

I'm still working on making a dual KSZ9477 on our imx7d custom board ( to extend capability to 10 + 2 fiber ports.

Sometimes, I got this kind of kernel trace saying BUG: Bad page state in process swapper/0 pfn:a3240 :

[70235.420382] BUG: Bad page state in process swapper/0  pfn:a3240
[70235.425027] page:abb4a100 count:-1 mapcount:0 mapping:  (null) index:0x0
[70235.430433] flags: 0x0()
[70235.431678] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000100 00000200 00000000
[70235.438466] raw: 00000000
[70235.439785] page dumped because: nonzero _count
[70235.443012] Modules linked in:
[70235.444775] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.78-mx7+gebefe02a1363 #1
[70235.451043] Hardware name: Freescale i.MX7 Dual (Device Tree)
[70235.455516] [<8010f7b0>] (unwind_backtrace) from [<8010b2bc>] (show_stack+0x10/0x14)
[70235.461970] [<8010b2bc>] (show_stack) from [<808446fc>] (dump_stack+0x78/0x8c)
[70235.467904] [<808446fc>] (dump_stack) from [<801d0274>] (bad_page+0x120/0x150)
[70235.473837] [<801d0274>] (bad_page) from [<801d2b40>] (get_page_from_freelist+0x478/0x98c)
[70235.480811] [<801d2b40>] (get_page_from_freelist) from [<801d3784>] (__alloc_pages_nodemask+0xec/0xf34)
[70235.488911] [<801d3784>] (__alloc_pages_nodemask) from [<801d4684>] (page_frag_alloc+0x58/0x14c)
[70235.496405] [<801d4684>] (page_frag_alloc) from [<806a7dc0>] (__netdev_alloc_skb+0xb8/0x118)
[70235.503554] [<806a7dc0>] (__netdev_alloc_skb) from [<8053a958>] (fec_enet_rx_napi+0x9c4/0xc28)
[70235.510876] [<8053a958>] (fec_enet_rx_napi) from [<806bd720>] (net_rx_action+0x128/0x31c)
[70235.517762] [<806bd720>] (net_rx_action) from [<80101594>] (__do_softirq+0x10c/0x254)
[70235.524298] [<80101594>] (__do_softirq) from [<8012f704>] (irq_exit+0xbc/0x100)
[70235.530318] [<8012f704>] (irq_exit) from [<801710ac>] (__handle_domain_irq+0x80/0xec)
[70235.536856] [<801710ac>] (__handle_domain_irq) from [<80101440>] (gic_handle_irq+0x4c/0x90)
[70235.543912] [<80101440>] (gic_handle_irq) from [<8010c00c>] (__irq_svc+0x6c/0xa8)
[70235.550094] Exception stack(0x80d01f18 to 0x80d01f60)
[70235.553848] 1f00:                                                       00000000 00000002
[70235.560730] 1f20: 00000001 ab626200 ffffe000 80d03d7c 00003fe0 00003fe0 ab622528 00000001
[70235.567612] 1f40: f5472474 f4da6b72 00000000 80d01f68 8085bde4 805df650 20070013 ffffffff
[70235.574500] [<8010c00c>] (__irq_svc) from [<805df650>] (cpuidle_enter_state+0x88/0x300)
[70235.581216] [<805df650>] (cpuidle_enter_state) from [<8016556c>] (do_idle+0x1b8/0x200)
[70235.587841] [<8016556c>] (do_idle) from [<8016586c>] (cpu_startup_entry+0x18/0x1c)
[70235.594119] [<8016586c>] (cpu_startup_entry) from [<80c00c78>] (start_kernel+0x3a4/0x3b0)
[70235.600997] Disabling lock debugging due to kernel taint
[70235.605010] BUG: Bad page state in process swapper/0  pfn:a919a
[70235.609630] page:abc209a8 count:-1 mapcount:0 mapping:  (null) index:0x0
[70235.615033] flags: 0x0()
[70235.616273] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000000 abc209bc 00000000
[70235.623061] raw: 00000000
[70235.624379] page dumped because: nonzero _count
[70235.627606] Modules linked in:
[70235.629366] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B           4.14.78-mx7+gebefe02a1363 #1
[70235.636849] Hardware name: Freescale i.MX7 Dual (Device Tree)
[70235.641302] [<8010f7b0>] (unwind_backtrace) from [<8010b2bc>] (show_stack+0x10/0x14)
[70235.647751] [<8010b2bc>] (show_stack) from [<808446fc>] (dump_stack+0x78/0x8c)
[70235.653680] [<808446fc>] (dump_stack) from [<801d0274>] (bad_page+0x120/0x150)
[70235.659610] [<801d0274>] (bad_page) from [<801d2a44>] (get_page_from_freelist+0x37c/0x98c)
[70235.666583] [<801d2a44>] (get_page_from_freelist) from [<801d3784>] (__alloc_pages_nodemask+0xec/0xf34)
[70235.674683] [<801d3784>] (__alloc_pages_nodemask) from [<801d4684>] (page_frag_alloc+0x58/0x14c)
[70235.682174] [<801d4684>] (page_frag_alloc) from [<806a7dc0>] (__netdev_alloc_skb+0xb8/0x118)
[70235.689317] [<806a7dc0>] (__netdev_alloc_skb) from [<8053a958>] (fec_enet_rx_napi+0x9c4/0xc28)
[70235.696636] [<8053a958>] (fec_enet_rx_napi) from [<806bd720>] (net_rx_action+0x128/0x31c)
[70235.703519] [<806bd720>] (net_rx_action) from [<80101594>] (__do_softirq+0x10c/0x254)
[70235.710054] [<80101594>] (__do_softirq) from [<8012f704>] (irq_exit+0xbc/0x100)
[70235.716070] [<8012f704>] (irq_exit) from [<801710ac>] (__handle_domain_irq+0x80/0xec)
[70235.722607] [<801710ac>] (__handle_domain_irq) from [<80101440>] (gic_handle_irq+0x4c/0x90)
[70235.729662] [<80101440>] (gic_handle_irq) from [<8010c00c>] (__irq_svc+0x6c/0xa8)
[70235.735844] Exception stack(0x80d01f18 to 0x80d01f60)
[70235.739596] 1f00:                                                       00000000 00000002
[70235.746478] 1f20: 00000001 ab626200 ffffe000 80d03d7c 00003fe0 00003fe0 ab622528 00000001
[70235.753360] 1f40: f5472474 f4da6b72 00000000 80d01f68 8085bde4 805df650 20070013 ffffffff
[70235.760244] [<8010c00c>] (__irq_svc) from [<805df650>] (cpuidle_enter_state+0x88/0x300)
[70235.766956] [<805df650>] (cpuidle_enter_state) from [<8016556c>] (do_idle+0x1b8/0x200)
[70235.773580] [<8016556c>] (do_idle) from [<8016586c>] (cpu_startup_entry+0x18/0x1c)
[70235.779856] [<8016586c>] (cpu_startup_entry) from [<80c00c78>] (start_kernel+0x3a4/0x3b0)

This trace seems to always appear when network usage is high (but cpu load is still low). We first think about RAM issue, but it's the same with different boards/SOM

I join my fec_main.c file fec_main.zip

jeghub commented 3 years ago

Here is an example of kernel OOPS I got, generaly after several "BUG Bad state" messages:

# page:aaaaaaaa count:0 mapcount:1 mapping:  (null) index:0x0
flags: 0x0()
raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
raw: 00000000
page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
Unable to handle kernel paging request at virtual address 71ba2d10
pgd = 80004000
[71ba2d10] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.14.78-gd1fa131b4bb1 #169
Hardware name: Freescale i.MX7 Dual (Device Tree)
task: 80d08e80 task.stack: 80d00000
PC is at __dump_page_owner+0x3c/0x11c
LR is at 0x71ba2d08
pc : [<8022b198>]    lr : [<71ba2d08>]    psr: 200f0113
sp : 80d01840  ip : 80d7b0c0  fp : 00000000
r10: a98f7050  r9 : a8bce800  r8 : a82b3000
r7 : 00000004  r6 : 71ba2d0c  r5 : 71ba2d08  r4 : 00000000
r3 : 80d17ff4  r2 : 00000010  r1 : e38907da  r0 : 71ba2d08
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: a514806a  DAC: 00000051
Process swapper/0 (pid: 0, stack limit = 0x80d00210)
Stack: (0x80d01840 to 0x80d02000)
1840: 80a79418 00000024 90c1a16c a82b3000 00000000 00000010 80d01860 00000000
1860: 80a8b36c 80d01884 90c1a16c 80173678 80a79418 80d01884 90c1a16c 801fb57c
1880: 80a8b36c 80a79418 00000024 00000040 00000004 aaaaaaaa 00000024 00000000
18a0: aaaaaaaa 00000000 90c1a140 9029d900 90c1a16c 80684590 9029d900 ffffffea
18c0: 00004913 00000040 a82b3000 80683aac 9029d900 80698020 00000001 806981cc
18e0: 80d068f4 00000000 00000000 a82b3000 a82b3000 00000000 a8bce800 80698110
1900: a8bce800 9029d900 a82b3000 a98f7000 00000001 806c6c08 a98f7050 00000010
1920: 9029d900 a98f7000 0000acd4 00000000 00000000 80698714 a98f7050 00000001
1940: a8001e40 0000aaaa fffffff4 00000000 0000aaaa 00000002 00000888 8021b86c
1960: 00000000 9029d900 a9b24000 00000000 80d70fc0 00000890 00000002 00000888
1980: 80a7d890 8079b0b4 9029d900 a9b24000 00000000 80d70fc0 00000890 807ac958
19a0: 00000000 80d70fc0 00000990 807ac958 80d87880 80d70fc0 a408ec00 80d70fc0
19c0: a49db480 80d70fc0 00000000 a49adb80 a49adb80 9029d900 80d01a0c 806d3010
19e0: a49adb80 a41bcb40 80d01a1c 80d70fc0 9029d900 a82b3000 00000000 00000000
1a00: 00000002 8079b1bc 00000000 00000004 80d70f07 00000000 a82b3000 00000000
1a20: 80d70fc0 8079affc 8079b154 80d70fc0 8079b154 00000000 9029d900 807acc54
1a40: 8079b154 00000002 a41bcb07 a82b7000 a82b3000 00000000 80d70fc0 8079b154
1a60: a82b7000 9029d900 00000000 a82b7000 80d70fc0 807acd48 a82b7000 a82b3000
1a80: 8079b154 807acd48 a9b24000 9029d900 a9b24000 80d01b0c 80d70fc0 807ad310
1aa0: a9b24000 80d01b1c 80d70fc0 807ad310 80160015 807a9ba8 a98f6000 80d70fc0
1ac0: 80d70fc0 00000000 a49adb40 a49adb40 9029d900 80d01b0c a9b24000 806d3010
1ae0: a9b245c0 807a9ba8 a98f6000 80d70fc0 9029d900 a82b3000 a82b7000 00000928
1b00: 00000002 8079b298 a5163080 00000002 a98f6007 a82b7000 a82b3000 00000000
1b20: 80d70fc0 8079b154 a98f6000 9029d900 a98f6200 90c19942 00000000 a9b245c0
1b40: 00000000 8079ca7c 00000000 80c4ca38 00000000 00000000 80d01fb8 80167b00
1b60: 80167b00 80b45b5c 00012000 00000000 00000000 80d70fc0 8079c820 00000000
1b80: 9029d900 a82b7000 00000050 a98f6200 80a7d890 807acc54 00000000 00000000
1ba0: ffffff07 a82b7000 00000000 00000000 80d70fc0 8079c820 600f0113 9029d900
1bc0: 00000000 80d70fc0 00000000 807ad51c a82b7000 00000000 8079c820 802292bc
1be0: 00000010 00000010 908c89b0 00000002 807ada94 a8001e40 80d05679 80dd49b0
1c00: 01088020 807ada94 ffffe000 90fb1e40 00224250 8021c394 80160016 90fb1380
1c20: 00000002 9029d900 00000002 00000000 80d70fc0 00000001 80d01cac a98f6200
1c40: 80a7d890 807adb64 00000000 00000000 80d01f02 a9b24000 00000000 00000000
1c60: 80d70fc0 807ad3fc 9029d900 00000000 a49ade40 a49ade40 9029d900 806d3010
1c80: 908c8c4c 00000000 80dd4c88 9029d900 80d70fc0 a82b7000 90c19942 00000042
1ca0: 90c19900 8079cf68 80d01cd8 00000000 800f0107 a82b7000 00000000 00000000
1cc0: 80d70fc0 8079c820 80d01cf4 9029d900 a82b7000 00000001 80d068f4 80d064ec
1ce0: 8079cde0 806933d4 00000000 01080020 8096da08 9029d900 9029d900 8071995c
1d00: 00000940 af46a0c2 0000028e e4699d3e 168463ed 175c9075 00000000 80d064ec
1d20: 9029d900 a82b7000 ac0e0da0 ac0e0da0 80535958 a8d98000 a82b7000 80699144
1d40: 9029d900 a82b7000 ac0e0da0 ac0e0da0 00000003 9029d900 a82b7000 80699cb8
1d60: 9029d900 00000000 a82b7000 8051b860 00000000 00000000 00000000 00000000
1d80: ab8acc20 00000001 00000000 a8bc84d8 00000000 00000002 00000100 a82b75c0
1da0: 00000040 809013bc 00000040 a82b7630 00000000 00000000 00000165 a41bcb40
1dc0: 00000100 00000000 00000000 a8d981b4 02000022 8051a350 fffffec0 a82b76b8
1de0: a8001600 a82b7000 a82b75c0 0000015e 00000003 00000000 00000000 00000000
1e00: 00000000 0000000a 80d125c0 a82b76b8 8051afd4 0003d57c 0000012c 00000040
1e20: 80d01e48 80d03900 2a96e000 806995e8 ab624ac0 80cb6ac0 80d7c14b 80d064ec
1e40: 80af1718 80af5cac 80d01e48 80d01e48 80d01e50 80d01e50 0000003c 00000000
1e60: 80d0208c 00000003 ffffe000 00000101 80a7d890 80d02080 40000003 8010159c
1e80: 80d0bbc4 00000001 00000000 80d02080 80caf3b8 80d87880 0000000a 0003d57c
1ea0: 80d03900 00200102 00000001 80d87880 00000000 0000003c 00000001 00000000
1ec0: 80d01f18 a8014000 dd8b5529 801301c0 80cb5d28 801744dc 80d0587c 80d22274
1ee0: c080200c c0802000 80d01f18 c0803000 ddb0b96d 80101448 805bad60 200f0013
1f00: ffffffff 80d01f4c ab6236c8 80d00000 ddb0b96d 8010c2cc 00000000 00000002
1f20: 00000001 ab627400 ab6236c8 ffffe000 0000028e 0000028e ab6236c8 00000001
1f40: ddb0b96d dd8b5529 00000000 80d01f68 80839140 805bad60 200f0013 ffffffff
1f60: 00000051 00000000 00000000 00000000 80cb3588 ab6236c8 ffffe000 80d0557c
1f80: 80d0552c 80d7bef1 80a7a22c 80d0b7f0 00000000 80167808 000000be 80d86740
1fa0: 80d05500 ffffffff 80d86740 abfff980 80c4ca38 80167b00 80d8678c 80c00c80
1fc0: ffffffff ffffffff 00000000 80c005a4 00000000 80c4ca38 80d869d4 80d05518
1fe0: 80c4ca34 80d0a1c4 8000406a 410fc075 00000000 8000807c 00000000 00000000
[<8022b198>] (__dump_page_owner) from [<80684590>] (skb_release_data+0x160/0x16c)
[<80684590>] (skb_release_data) from [<80683aac>] (kfree_skb+0x24/0x60)
[<80683aac>] (kfree_skb) from [<80698020>] (validate_xmit_skb+0x20c/0x2cc)
[<80698020>] (validate_xmit_skb) from [<80698110>] (validate_xmit_skb_list+0x30/0x64)
[<80698110>] (validate_xmit_skb_list) from [<806c6c08>] (sch_direct_xmit+0xfc/0x184)
[<806c6c08>] (sch_direct_xmit) from [<80698714>] (__dev_queue_xmit+0x3cc/0x634)
[<80698714>] (__dev_queue_xmit) from [<8079b0b4>] (br_dev_queue_push_xmit+0xb8/0x158)
[<8079b0b4>] (br_dev_queue_push_xmit) from [<807ac958>] (br_nf_post_routing+0x200/0x364)
[<807ac958>] (br_nf_post_routing) from [<806d3010>] (nf_hook_slow+0x3c/0xcc)
[<806d3010>] (nf_hook_slow) from [<8079b1bc>] (br_forward_finish+0x68/0xa8)
[<8079b1bc>] (br_forward_finish) from [<807acc54>] (br_nf_hook_thresh+0xc8/0xe0)
[<807acc54>] (br_nf_hook_thresh) from [<807acd48>] (br_nf_forward_finish+0xdc/0x18c)
[<807acd48>] (br_nf_forward_finish) from [<807ad310>] (br_nf_forward_ip+0x314/0x400)
[<807ad310>] (br_nf_forward_ip) from [<806d3010>] (nf_hook_slow+0x3c/0xcc)
[<806d3010>] (nf_hook_slow) from [<8079b298>] (__br_forward+0x9c/0x138)
[<8079b298>] (__br_forward) from [<8079ca7c>] (br_handle_frame_finish+0x25c/0x518)
[<8079ca7c>] (br_handle_frame_finish) from [<807acc54>] (br_nf_hook_thresh+0xc8/0xe0)
[<807acc54>] (br_nf_hook_thresh) from [<807ad51c>] (br_nf_pre_routing_finish+0x120/0x384)
[<807ad51c>] (br_nf_pre_routing_finish) from [<807adb64>] (br_nf_pre_routing+0x3e4/0x418)
[<807adb64>] (br_nf_pre_routing) from [<806d3010>] (nf_hook_slow+0x3c/0xcc)
[<806d3010>] (nf_hook_slow) from [<8079cf68>] (br_handle_frame+0x188/0x318)
[<8079cf68>] (br_handle_frame) from [<806933d4>] (__netif_receive_skb_core+0x1d0/0xa04)
[<806933d4>] (__netif_receive_skb_core) from [<80699144>] (netif_receive_skb_internal+0x44/0x130)
[<806991<8016682c>] (__wake_up_common) from [<801669c8>] (__wake_up_locked+0x18/0x20)
[<801669c8>] (__wake_up_locked) from [<80167418>] (complete+0x38/0x48)
[<80167418>] (complete) from [<805143ec>] (spi_imx_isr+0xa8/0xc0)
[<805143ec>] (spi_imx_isr) from [<80174c8c>] (__handle_irq_event_percpu+0x50/0x12c)
[<80174c8c>] (__handle_irq_event_percpu) from [<80174d84>] (handle_irq_event_percpu+0x1c/0x64)
[<80174d84>] (handle_irq_event_percpu) from [<80174e04>] (handle_irq_event+0x38/0x5c)
[<80174e04>] (handle_irq_event) from [<80178408>] (handle_fasteoi_irq+0xc0/0x170)
[<80178408>] (handle_fasteoi_irq) from [<80173fa8>] (generic_handle_irq+0x24/0x34)
[<80173fa8>] (generic_handle_irq) from [<801744d8>] (__handle_domain_irq+0x7c/0xec)
[<801744d8>] (__handle_domain_irq) from [<80101448>] (gic_handle_irq+0x4c/0x90)
[<80101448>] (gic_handle_irq) from [<8010c2cc>] (__irq_svc+0x6c/0xa8)
Exception stack(0x80d01658 to 0x80d016a0)
1640:                                                       00000041 00000000
1660: 00000301 80d00000 80d87418 00000000 0000000b 00000000 7f000000 600f0113
1680: 80d08e80 ffffe000 00000007 80d016a8 8012b284 8012b288 200f0113 ffffffff
[<8010c2cc>] (__irq_svc) from [<8012b288>] (panic+0x1ec/0x250)
[<8012b288>] (panic) from [<8010b774>] (die+0x1f4/0x330)
[<8010b774>] (die) from [<801139dc>] (__do_kernel_fault.part.0+0x64/0x74)
[<801139dc>] (__do_kernel_fault.part.0) from [<80113d70>] (do_bad_area+0x0/0x7c)
[<80113d70>] (do_bad_area) from [<80d056c4>] (__per_cpu_offset+0x0/0x10)
---[ end trace f3f57eab0b48955c ]---
kernel: page:aaaaaaaa count:0 mapcount:1 mapping:  (null) index:0x0
jeghub commented 3 years ago

Seems like KERNEL OOPS occured when STP is enabled and using 2*KSZ (one on each FEC of IMX7). Not sure but I've disabled STP on a board and it don't crash since a week.

parthshah3690 commented 2 years ago

Hi,

I am having similar issue. Do you have solution available already? Please check #77

romatou18 commented 2 years ago

Hi @parthshah3690,

Would you mind sending on gist.com or here your dts file with the KZS9477 spi or i2c configuration ?

Kind Regards

triha2work commented 2 years ago

Please try the 5.4 driver. It should be compatible to 5.3.