Open parthshah3690 opened 2 years ago
Hi,
To answer your question on #74 and #77, I have no solution sorry. Like you I had crash ONLY when enabling features like PTP, STP... so I disabled all of them.
I do not need PTP but I will need to use STP (#77) or multi_dev = 1 mode (#70). I had no the time to find a solution. I'm now using multi_dev = 0 mode and disable stp because it was not our priority. But I will have to find a solution soon. Please if you find something post your solution. I will do the same when I will work again on it.
Try disable F_SG feature in the MAC driver to verify the problem. This greatly reduces the TCP transmit performance but we want to debug the problem first. Then we probably need to use the copy mechanism in the updated 5.4 driver.
Hi @triha2work ,
I have disabled F_SG feature (NETIF_F_SG - HW feature) from FEC driver, but still I am getting the same crash.
Hi @parthshah3690,
Would you mind sending on gist.com or here your dts file with the KZS9477 spi or i2c configuration ?
Kind Regards
Please try the 5.4 driver. It should be compatible to 5.3.
Hi @triha2work, I am using kernel 4.14 Past I did try porting Drivers from 5.4 to 4.14, but I am getting porting errors.
Hi @parthshah3690 have you find a way to correct this issue?
I'm doing iperf test the same way you were doing on your first post, the server is on my custom board and client on a computer.
With CONFIG_KSZ_PTP is disabled, I have no error with TCP and bandwith about 900mbits/s
With UDP however, observed about 90% lost packet for 1000M test (option -b1G)
On eth0 I can see a lot of packet error, all errors are for overrun
I'm using iperf 3.1.3 ; For udp the command is iperf3 -c 192.168.3.182 -u -b1G
Hi @jeghub , No solution is available. I did a migration to linux 5.4 and took KSZ driver. Still it has few problems with buffers.
Thank you for your quick answer @parthshah3690
Do you think it was worth it migrate to 5.4 ? Do you still have the same issues when PTP is enable? I was thinking to migrate but afraid to get stuck with the same issues.
Hi @jeghub, If you migrate to linux 5.4, you would not see the current error. But you will see warning with transmit queue is getting filled frequently. Please refer : https://community.nxp.com/t5/i-MX-Processors/I-MX6SoloX2-Linux-5-4-70-2-3-0-transmit-queue-0-timed-out/td-p/1368528
Hi all, @triha2work, @micreladmin, @Ravi-Hegde @davidcai-micrel @jeghub @RaymondKim @Aryz @bnielsen1965
I need your help. I am using ksz9477 chip on a custom HW running Linux 4.14, connected to FEC imx6 processor. I'm using the fec_main.c and fec patch from this git : https://github.com/Microchip-Ethernet/EVB-KSZ9477/tree/master/KSZ/linux-drivers/ksz9897/linux-4.14/drivers/net/ethernet/freescale
I have connected 2 custom HW in LAN and able to ping between both HW.![image](https://user-images.githubusercontent.com/4667292/133781953-9857e8a0-8ade-4aa7-8db4-55d870078071.png)
To check the network performance, I am using iperf3. But as soon as client gets connected to iperf3 client, I am seeing a crash from kernel. iperfCrash.txt
When I disable CONFIG_KSZ_PTP configuration, I do not see any crash. However, I tried using TCP and UDP with different combinations, I see below results:
I first think about RAM issue, but I could see that sufficient RAM was available before/during the crash happens.
[ 395.007282] BUG: Bad page state in process swapper/0 pfn:86baf [ 395.013245] page:3186c4d7 count:-1 mapcount:0 mapping: (null) index:0x0 [ 395.019970] flags: 0x0() [ 395.022533] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000000 9fb2e5f4 00000000 [ 395.030635] page dumped because: nonzero _count [ 395.035175] Modules linked in: cywdhd(O) mxc_dcic evbug [ 395.040450] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.14.200+g20245046a7a0 #1 [ 395.048990] Hardware name: Freescale i.MX6 SoloX (Device Tree) [ 395.054876] [<8010f2ec>] (unwind_backtrace) from [<8010ac4c>] (show_stack+0x10/0x14) [ 395.062650] [<8010ac4c>] (show_stack) from [<80a931a4>] (dump_stack+0x84/0x98) [ 395.069909] [<80a931a4>] (dump_stack) from [<801cd134>] (bad_page+0x114/0x144) [ 395.077164] [<801cd134>] (bad_page) from [<801cf6b0>] (get_page_from_freelist+0x320/0x8ec) [ 395.085461] [<801cf6b0>] (get_page_from_freelist) from [<801d0370>] (alloc_pages_nodemask+0xd8/0xc68) [ 395.094881] [<801d0370>] (alloc_pages_nodemask) from [<801d0fbc>] (page_frag_alloc+0x5c/0x150) [ 395.103695] [<801d0fbc>] (page_frag_alloc) from [<8088c298>] (netdev_alloc_skb+0xb8/0x118) [ 395.112160] [<8088c298>] (netdev_alloc_skb) from [<806289c8>] (fec_enet_rx_napi+0x284/0xcd8) [ 395.120803] [<806289c8>] (fec_enet_rx_napi) from [<8089ffb0>] (net_rx_action+0x11c/0x314) [ 395.129006] [<8089ffb0>] (net_rx_action) from [<801015e0>] (do_softirq+0xd8/0x230) [ 395.136779] [<801015e0>] (__do_softirq) from [<801307b0>] (irq_exit+0xbc/0x104) [ 395.144127] [<801307b0>] (irq_exit) from [<8016c2f4>] (handle_domain_irq+0x80/0xe8) [ 395.151984] [<8016c2f4>] (handle_domain_irq) from [<801014c4>] (gic_handle_irq+0x4c/0x90) [ 395.160356] [<801014c4>] (gic_handle_irq) from [<8010b98c>] (irq_svc+0x6c/0xa8) [ 395.167854] Exception stack(0x81001f40 to 0x81001f88) [ 395.172930] 1f40: 00000000 80e04044 1eaa8000 80118060 81000000 81003db8 81003d6c 8107a000 [ 395.181130] 1f60: 81003d40 81003d40 00000001 80f6ba30 00000001 81001f90 8010811c 80108120 [ 395.189320] 1f80: 60000013 ffffffff [ 395.192842] [<8010b98c>] (irq_svc) from [<80108120>] (arch_cpu_idle+0x38/0x3c) [ 395.200275] [<80108120>] (arch_cpu_idle) from [<80160cec>] (do_idle+0xb8/0x138) [ 395.207613] [<80160cec>] (do_idle) from [<80161014>] (cpu_startup_entry+0x18/0x1c) [ 395.215213] [<80161014>] (cpu_startup_entry) from [<80f00c68>] (start_kernel+0x39c/0x3b0) [ 395.223406] Disabling lock debugging due to kernel taint [ 395.229733] BUG: Bad page state in process swapper/0 pfn:86bdf [ 395.235691] page:041752be count:-1 mapcount:0 mapping: (null) index:0x0 [ 395.242415] flags: 0x0() [ 395.244977] raw: 00000000 00000000 00000000 ffffffff ffffffff 9fb2ebf4 9fb2ebf4 00000000 [ 395.253079] page dumped because: nonzero _count [ 395.257618] Modules linked in: cywdhd(O) mxc_dcic evbug [ 395.262892] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B O 4.14.200+g20245046a7a0 #1 [ 395.271431] Hardware name: Freescale i.MX6 SoloX (Device Tree) [ 395.277311] [<8010f2ec>] (unwind_backtrace) from [<8010ac4c>] (show_stack+0x10/0x14) [ 395.285082] [<8010ac4c>] (show_stack) from [<80a931a4>] (dump_stack+0x84/0x98) [ 395.292339] [<80a931a4>] (dump_stack) from [<801cd134>] (bad_page+0x114/0x144) [ 395.299593] [<801cd134>] (bad_page) from [<801cf6b0>] (get_page_from_freelist+0x320/0x8ec) [ 395.307889] [<801cf6b0>] (get_page_from_freelist) from [<801d0370>] (alloc_pages_nodemask+0xd8/0xc68) [ 395.317309] [<801d0370>] (alloc_pages_nodemask) from [<801d0fbc>] (page_frag_alloc+0x5c/0x150) [ 395.326124] [<801d0fbc>] (page_frag_alloc) from [<8088c298>] (netdev_alloc_skb+0xb8/0x118) [ 395.334590] [<8088c298>] (netdev_alloc_skb) from [<806289c8>] (fec_enet_rx_napi+0x284/0xcd8) [ 395.343232] [<806289c8>] (fec_enet_rx_napi) from [<8089ffb0>] (net_rx_action+0x11c/0x314) [ 395.351436] [<8089ffb0>] (net_rx_action) from [<801015e0>] (__do_softirq+0xd8/0x230) [ 395.359209] [<801015e0>] (do_softirq) from [<801307b0>] (irq_exit+0xbc/0x104) [ 395.366556] [<801307b0>] (irq_exit) from [<8016c2f4>] (handle_domain_irq+0x80/0xe8) [ 395.374417] [<8016c2f4>] (handle_domain_irq) from [<801014c4>] (gic_handle_irq+0x4c/0x90) [ 395.382789] [<801014c4>] (gic_handle_irq) from [<8010b98c>] (__irq_svc+0x6c/0xa8) [ 395.390284] Exception stack(0x81001f40 to 0x81001f88) [ 395.395360] 1f40: 00000000 80e04044 1eaa8000 80118060 81000000 81003db8 81003d6c 8107a000 [ 395.403558] 1f60: 81003d40 81003d40 00000001 80f6ba30 00000001 81001f90 8010811c 80108120 [ 395.411748] 1f80: 60000013 ffffffff [ 395.415267] [<8010b98c>] (__irq_svc) from [<80108120>] (arch_cpu_idle+0x38/0x3c) [ 395.422698] [<80108120>] (arch_cpu_idle) from [<80160cec>] (do_idle+0xb8/0x138) [ 395.430035] [<80160cec>] (do_idle) from [<80161014>] (cpu_startup_entry+0x18/0x1c) [ 395.437635] [<80161014>] (cpu_startup_entry) from [<80f00c68>] (start_kernel+0x39c/0x3b0)
Have you a solution or any idea for the problem?