KSZ9477 with NXP fec - Githubissues

jeghub commented 5 years ago

Hello,

I'm trying to integrate a KSZ9477 on a NXP imx7 using your fec_main patch. We're facing a kernel panic because of NULL pointer when we use an other iface than the first one. Attached is the full kernel trace.

It seems to be an issue in "fec_enet_start_xmit" function. At line : txq = fep->tx-queue[queue]; txq is NULL. I think it's because we are trying to get a queue from fep, and fep is a pointer on a"fec_enet_private" struct from a net_device that is not the first one, no tx_queue as been set for the fec on port other than the first one.

We were able to correct it adding the lines at the beggining of "fec_enet_start_xmit" function if HAVE_KSZ_SWITCH is defined :

if(fep != fep->hw_priv) {
    fep = fep->hw_priv;
}

The beggining of the xmit function is now as follow :

static netdev_tx_t
fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
        struct fec_enet_private *fep = netdev_priv(ndev);
    int entries_free;
    unsigned short queue;
    struct fec_enet_priv_tx_q *txq;
    struct netdev_queue *nq;
    int ret;

#ifdef HAVE_KSZ_SWITCH
    unsigned long flags;
    if(fep != fep->hw_priv) {
        fep = fep->hw_priv;
    }
    spin_lock_irqsave(&fep->tx_lock, flags);
#endif

If I understand well, we need to use the "main port" (that's the name you gave in macb files) for sending values. Adding this lines force the use of the main port. Am i correct?

Kernel Panic KSZ.txt

jeghub commented 4 years ago

Hi,

I update this post with a new issue when working on NXP controller with FEC. Since you update for multi-device support the work with fec was better but there is always some bug left.

On some packet CRC is wrong. It happens when the payload size is odd. At least UDP packet with ODD number of byte in payload are concerned. We noticed it with DNS requests when server name char number was odd the CRC was wrong. A comment was left saying : "There seems to be a hardware bug such that the checksum will be wrong when the size of the fragment before the last one is odd". In fact in appeared that it when the packet itself is odd. Removing the check on nr_flags and checking only if packet size is odd seems to correct this.

// There seems to be a hardware bug such that the checksum
// will be wrong when the size of the fragment before the last
// one is odd.
if (skb->len & 1) {
    struct sk_buff *nskb;
    nskb = dev_alloc_skb(len);
    if (nskb) {
        skb_copy_and_csum_dev(skb, nskb->data);
        skb->ip_summed = 0;
        nskb->len = skb->len;
        copy_old_skb(skb, nskb);
        skb = nskb;
        skb_set_tail_pointer(skb, skb->len);
    }
}

We're facing an other issue with Bridge. In multi_dev = 1 I Bridge some iface together and I'm facing kernel oops. I haven't investigate it yet. I will add a comment if i find something.

Thanks for your work.

triha2work commented 4 years ago

The fec_main.c driver in linux-4.9 is not updated for real use. Please refer the linux-4.14 patch if you want to use linux-4.9. The hardware checksum problem was fixed in latest patch.

jeghub commented 4 years ago

Our kernel version is 4.14 and we already use linux-4.14 patch. The last patch you made last week as fixed a lot of things. Thanks ! I still got errors when bridging two interfaces but I've not investigate it. The bridge is correctly created but when trafic increase there is some null pointer causing a kernel oops. I'm not able to reproduce it today but I can attach logtrace tomorrow.

We use an imx7d with two FEC. Do you think the driver will support the use of two KSZ. One on each FEC ?

triha2work commented 4 years ago

Yes, the switch driver model supports that, and there was an actual implementation of 2 chips connected to 2 MAC. The switch driver creates a virtual PHY device sw.0. The new one is sw.1. The new driver code may break this operation though, as it is not tried on KSZ9477. I do now have a setup that connects one switch in RGMII and the other in RMII. It may work but the shared interrupt may create a problem.

jeghub commented 4 years ago

What do you mean by :

It may work but the shared interrupt may create a problem.

What shared interrupt are you talking about?

After your answer I've tried to use 2 chips connected to 2 FECs.

I've only made some very little change on fec_main.c to attach the phy to the right mii bus (sw.0 or sw.1), and another in ksz_sw_9877.c because there was an arror in init_sw_dev() when registering the second ksz. The class is already create and the name of the char device must be different so I add the dev_id at the end. Same error occurs on ptp_init but I've not correct it yet. Find the patch attached to this issue if you think it's usefull.

two_ksz9877_support.patch.txt

On the other hand, I've test again, in multi_dev=1, to bridge two iface and I got the following errors (see the file br_crash_trace.txt) Have you any idea what's happening ?

br_crash_trace.txt

triha2work commented 4 years ago

It does require some small changes to the switch and MAC drivers to setup 2 instances of driver. I have modified my setup to do that successfully. Generally shared interrupt is not an issue as the chip likely has its own interrupt line. In my setup the same interrupt can be used on both chips but the driver does not like that. I saw a similar crash when skb->head is set incorrectly in another MAC controller driver. I can try using HSR RedBox again to see if there is problem.

triha2work commented 4 years ago

I use HSR Redbox and run nuttcp over the switch without issue. When I setup br0 and enable it the first time the kernel crashes. After applying the patch for net/bridge/br_device.c again I have no problem running nuttcp. There may be a more elegant way to increase hard_header_len else where, but I did not find that. The crash you reported seemed to be different from the one I encountered. In that case I have no idea how to fix it. The driver I am running is more network intensive as IBA is used to access switch registers.

jeghub commented 4 years ago

Does the HSR RedBox has two FEC controllers? Or another MAC controller ? I've spend all the day searching for this bug but I've not find anything yet. The patch for net/bridge/br_device.c was correctly applied.

You said you use IBA that is more network intensive. Is there a lot of communication when running? I thought using multi_dev = 1 all network frame were sent to the host so except for configuration (like a change of MAC@, vlans ...) there was no more SPI communication after the initialization.

Another question, I've tried to add to my bridge saying : eth0 that is an interface of my first KSZ, and eth8 that is an interface of my second KSZ. It seems that's not working, I'm not receiveing the answer. I tried setting both ETH0 and ETH8 in promiscious mode. In fact in our application we want to use two KSZ, one on each FEC. KSZ are in multi_dev=1 mode and we want to use multiple Bridges to make configurable Router/Switchs.

Do you think that could work?

Thks for your feedback.

triha2work commented 4 years ago

What I have done was using one MAC controller to simulate multiple network devices. IBA is used completely as I do not have SPI/I2C access to the switch with the iMX SoC (Wandboard Dual) I have. Background MIB counter reading is then done by network access all the time, and those accesses compete with regular accesses from the system. Current switch driver does not support multiple instances unless some small modifications are made. I will update the code later. Assuming the switch driver can support 2 instances, the MAC driver then should control each switch with devices eth0 and eth1 using the "multi_dev=0" mode. Using a bridge device br0 connecting them should be straightforward. Next step is to use "multi_dev=1" mode with one MAC and make sure there is no issue. Final step is to use 2 MAC controllers and the behavior should the be same as in the "multi_dev=0" mode. However, using the bridge device and the "multi_dev=1" setup in a simple way is not good for router. I would assume for a 2 switches setup there will be eth0 for WAN, eth1 for LAN, and eth2 for another LAN in the second switch. So the multiple port setup has to be different in each switch.

jeghub commented 4 years ago

About support of 2 KSZ, each one on a FEC, I've made some changes that I've shared in a previous message but I don't know if something is missing except that I consider both KSZ work with the same "multi_dev mode". I do not pretend to understand all the stuff that made your KSZ driver ;)

I will wait for your update to check it.

Currently, with my changes, I'm able to setup both KSZ, each one on a FEC. Usint "multi_dev=0" mode I've tried to bridge eth0 and eth1 on br0. I got the same error than the one I posted 2 messages above.

About Bridge and router, for my understanding, I will try to better explain our idea. Please correct me if I'm wrong : Our understanding is that in multi-dev = 1 the ksz "does not switch" and forward all frames to the host. So with 2 FEC, with one KSZ on each FEC, we got 12Iface, eth0 to eth11. Using bridge, we create saying : br0, br1... there can be more. And we add iface to the bridges. Iface can be on the same or different KSZ. Using multiple bridge, we can have multiple IP, and IP can be on different subnets. By enabling router or using iptables rules we can make a router between bridges. Assuming our idea is feasable, there is still a lot of pending questions : about performances ; about STP, does it will be correctly handled... but that's not for today.

Edit : I've done another test using "multi_dev = 0 mode" and bridge : My two switch are up. I've eth0 for KSZ0 and eth1 for KSZ1. I create br0 and add eth0 and eth1. I only have a connector to my gateway on KSZ0. Using the serial communication, on my custom board, I use wget on a big file (10Go) to create network trafic (download is at about 30MB/s) => in this case no problem. I know the bridge is useless in this case. All i wanted to try was if it was only the fact to add a bridge that create this error.

I add a computer on KSZ1. It take only few seconds before I got errors and my board crash. I can't figure out if the time before the crash chage according to the network load.

jeghub commented 4 years ago

Hi, I've patch my files using your last commit. The use of 2 KSZ is working fine, like with the changes I made, but I still got the sames issues (see my tracelog on previous post) when Bridge is used. I'm pretty sure the issue come from PTP part. => I've disable all options (CONFIG_KSZ_PTP, CONFIG_KSZ_IBA ... ) before recompiling my kernel and it's working fine without all options. Adding only CONFIG_KSZ_PTP and the issue came back. Currently we don't use PTP so I simply disable it.

triha2work commented 4 years ago

The only difference with PTP is there are 4 extra bytes in the transmit tail tag. You can test that scenario by disabling PTP function. It is done by writing 0x39 to register 0x515.

jeghub commented 4 years ago

I've tried to recompile the driver with CONFIG_KSZ_PTP option and manually writing 0x39 to register 0x515 and that don't crash anymore. Is this enought to conclude that this issue came from this 4 extra bytes?

triha2work commented 4 years ago

That may be the problem but I do not know why. The tail tag is either added manually through a fragment or a new allocated socket buffer. If you do not need 1588 PTP then the temporary solution is to disable CONFIG_KSZ_PTP.

jeghub commented 4 years ago

Hi Triha, We've temporary disable CONFIG_KSZ_PTP and it's working fine. I've a new question for you about STP. As said in previous message, we use 2 KSZ. So I don't think I can use the RSTP from the driver because we can have loop between iface on each KSZ. Like we are in multi-dev = 1 mode, we can bridge all iface (eth0 to eth11) in the same bridge and enabled STP on that bridge. We made a loop, saying between eth0 and eth1, and STP is not working correctly => both iface stay in "Forward" state and I got endless brodcast frame. Using tcpdump on eth iface where a loop was made, I've found that BPDU are correctly send, but I don't receive the BPDU of the other port. Do you think the driver or the switch itself can block it? If yes, is there a register to enable on the swith to allow BPDU to be forward to the host port ?

triha2work commented 4 years ago

You may want to contact Microchip Support to request an application notes about using STP function in the KSZ switches. Each port has receive, transmit, and learning function. To stop the port they can be disabled. To receive BPDU through a closed port it requires setting up the static MAC table to add an entry for the BPDU multicast adddress with an override setting. In KSZ9477 this entry is the first in the multicast reserved table. Tail tag needs to be enabled in th host port. Destination ports are used to send frames to specific ports. When they are empty the lookup bit should be set. The override bit forces the switch to send the frame no matter what. It is required to send BPDU through a closed port. I do not think using software bridge is feasible. It is inefficent to send multiple broadcast frames to the ports. When the hardware switch is doing the forwarding then the device may receive 2 copies: one from the switch and one from the software bridge. If you want to use this way you may want to consider the new DSA driver. It has a feature to indicate the frame is already forwarded so the software bridge does not need to. However, I do not know how it supports multiple switches. There is a new switch with cascaded switch feature, but it uses T1 PHY and so does not apply. Its driver was modified to handle 2 switches, so I have a bit of understanding of how to do it. However I do not have a real setup so I do not know if there is a design flaw in the driver. You will need to write your own driver. There should be one switch instance instead of two. The number of ports is increased from 5 to 10 (just an example). The forwarding between switches is done similar to the HSR RedBox implementation. Whenever a frame is received from one network device it is passed to the other device, except it is a unicast frame addressed to the host. Host always sends multicast frames to 2 network devices. It needs to keep track of unicast addreses of frames sent to it so it only needs to send unicast frames to one device instead of two. The switch has an unknown unicast forwarding feature so that if the unicast address is not already learned it can be forwarded to the host so that the host can pass it to the other switch. That way unicast traffic within one switch will not need to go through the other switch.

jeghub commented 4 years ago

I've find a pdf file from microchip saying that we can handle rstp from user space using mstpd with tail_tag enable and bridge. This document is a old one (2016) but I've figured out hot to make it work. For STP what I've done is as following :

Setting multidev mode to 1 => that enable tail tag. What I see when in this mode is that hardware switch is not switching anymore. Frame from a port is send to HOST and HOST frame are send only to the DEST_PORT. Can you confirm that?
Compile the driver with CONFIG_KSZ_STP enable. That enable some part of code in ksz_sw_9897.c, like the sw_setup_stp() function that add BPDU multicast addresse in static multicast table like you said on your last comment. I've only change one thing to force the override mode for my tests. Override was set only when in U-boot stp env is set to 1 to enable stp on the driver. So here I enable the override but BDPU are not handled by your driver.

entry = &info->mac_table[STP_ENTRY];
    entry->addr[0] = 0x01;
    entry->addr[1] = 0x80;
    entry->addr[2] = 0xC2;
    entry->addr[3] = 0x00;
    entry->addr[4] = 0x00;
    entry->addr[5] = 0x00;
    if (sw->stp || sw->multi_dev == 1) {
        entry->ports = sw->HOST_MASK;
        entry->override = 1;

=>With this setup I'm able to get BPDU from each port and my bridge(s) can handle the STP/RSTP.

I've not uderstand the half end of your previous message. What do you mean by :

I do not think using software bridge is feasible. It is inefficent to send multiple broadcast frames to the ports. When the hardware switch is doing the forwarding then the device may receive 2 copies: one from the switch and one from the software bridge.

triha2work commented 4 years ago

With software bridge every unicast frame has to be forwarded by software. That is why I think it is not feasible. You can manipulate the port membership to allow them to be forwarded by hardware, but then you will get duplicates for multicast frames. With CONFIG_KSZ_STP you can let the switch driver handles RSTP if you think the implementation is correct. Currently the best bet is to use the DSA driver if the only feature you need is RSTP.

Bartel-C8 commented 3 years ago

@triha2work : Picking in on this. I tried using (R)STP with the mainline (5.10) kernel DSA driver, but cannot get it to properly block traffic. Neither using the kernel STP driver, nor the user-space mstpd daemon.

I get errors like, when interconnecting port 1 and port 2:

br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: port 2(lan2) entered blocking state
net_ratelimit: 41 callbacks suppressed
br0: received packet on lan1 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: port 2(lan2) entered learning state
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: port 2(lan2) entered blocking state
net_ratelimit: 2103 callbacks suppressed
br0: received packet on lan1 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)
br0: port 2(lan2) entered learning state
br0: received packet on lan2 with own address as source address (addr:8a:b0:64:56:0d:bb, vlan:1)

Is there something ((tail)tagging/...) I need to enable to get this working? It seems the loop is detected, but somehow it decides still to forward again.

Thanks!

romatou18 commented 3 years ago

@jeghub, Hi, we are working on getting a 9477 to work over an IMX8 FEC controller. I was wondering if you would mind sharing the Device tree configuration that enabled this ? i.e. the snippet of the .dts ?

Many Thanks Regards Romain from the Flightcell.com dev team

Microchip-Ethernet / EVB-KSZ9477

KSZ9477 with NXP fec #35