Microchip-Ethernet / EVB-KSZ9477

Repository for using Microchip EVB-KSZ9477 board. Product Supported: KSZ9477, KSZ9567, KSZ9897, KSZ9896, KSZ8567, KSZ8565, KSZ9893, KSZ9563, KSZ8563, LAN9646, Phys(KSZ9031/9131, LAN8770
76 stars 79 forks source link

KSZ9477 PTP network failure #83

Open miazan opened 2 years ago

miazan commented 2 years ago

I'm using the KSZ9477 board as PTP boundary clock, and I've come across a problem where the board loses the ability to send or receive Ethernet frames if a connected device is rebooted while there is a large amount of traffic going through the board. This doesn't happen every time a device is rebooted, but the probability of this occurring increases with the traffic level.

I'm not sure what it is on the board that is crashing: Using ifconfig shows that the board still reads the Ethernet ports as active, the board is still capable of discerning which ports have active Ethernet connections on them, and ptp4l is still attempting to send PTP messages on the active ports as indicated by the fact that the TX packet count in ifconfig is still increasing. However, attempting to ping connected devices yields no response, and checking connected devices shows that neither the board's PTP messages nor network traffic it should be forwarding is being passed through. As far as I can tell, the only thing wrong is that the Ethernet ports themselves are no longer able to handle Ethernet frames.

Bartel-C8 commented 2 years ago

See possibly https://github.com/Microchip-Ethernet/EVB-KSZ9477/issues/32

Be sure to enable one-step PTP, so do not use 2-step!

olerem commented 2 years ago

@miazan , was you able to disable TC-mode (auto correction field update) by implementing BC-mode?

triha2work commented 1 year ago

This is a know hardware bug that a link change causes the port to stop transmitting when 2-step clock is used. A workaround is to use 1-step clock. This causes the Sync message to always contain the actual transmit timestamp. This does not cause any problem in PTP synchronization unless protocol wants to enforce no modification at all. A more serious problem is Pdelay_Resp. If P2P is not used then it is not an issue. As the hardware already puts the turnaround time in the correction field the driver workaround is to return receive timestamp of Pdelay_Req rather than transmit timestamp of Pdelay_Resp such that the calculated turnaround time becomes zero. Some PTP stacks may not calculate the time correctly as they assume zero correction field and do not take that into consideration. But according to the specifications all timing information in the PTP messages need to be calculated.

olerem commented 1 year ago

@triha2work , ok. Tank you! Should the correctionField be updated even if tail tagging is enabled on the CPU port? Currently it looks like I can't combine TC functionality with DSA support. Is it correct?

triha2work commented 1 year ago

The 1588 PTP engine of KSZ switch was designed to add 1588 PTP feature to the MAC as it is required to operate the switch. It is possible to run the switch as pure TC by not enabling tail tagging, but that disallows port control of network traffic. The MAC is then used by the PTP stack to synchronize the clock. If the switch clock frequency is not wildly off the accuracy may be acceptable. It is possible to run another PTP stack on the switch to ensure good accuracy, but that is too much trouble. Anyway this scheme does not work if P2P is used as then there will be multiple Pdelay_Resp for each Pdelay_Req because the switch is acting as a completely separated switch outside the MAC. Turning on tail tagging on the port indicates the port is a host port which acts as an endpoint so correction field will not be updated in the PTP event messages.

olerem commented 1 year ago

If the switch clock frequency is not wildly off the accuracy may be acceptable.

This part worries me a bit. So, this can potentially happen? To correct this clock we should enable tail tagging and get control of PTP stack? But in this case TC functionality will not be available.

Hm, may be you have some ideas. My target is to sync PHC clock of the SoC, not of the KSZ switch. With KSZ switch I have following options: a) Use switch own transparent clock functionality, without DSA/Tailtagging, only in one-step E2E mode. With the risk the switch clock my be potentially wildly off. b) Implement ordinary clock with software PTP stack, with DSA/Tailtagging enabled and then sync KSZ PHC with the SoC PHC. Correct?

triha2work commented 1 year ago

It depends on the quality of the oscillator used by the switch. Typical residence time is about 8 microseconds for gigabit traffic. Typical clock accuracy seen in the KSZ9477 evaluation board is about 5 microseconds drift per second. The clock can be tuned by adjusting resistors to make the drift smaller, but it probably cannot be done for mass produced product. The term syntonization is used to indicate the switch clock is running in step with the master clock. It is not required for the switch to be synchronized to be effective, just syntonized. This syntonization can be easily done by just receiving and processing 2 Sync messages. So it is possible to have this calibration done at the beginning by turning on tail tagging and then switching it off after the switch clock frequency is updated to compensate the drift in the board. Note the current DSA driver does not support 1588 PTP yet.