Open wfullmer12 opened 3 years ago
Did you solve your issue? I would appreciate if you could share what or how you solved it. Thanks!
Hmm, I thought I responded to this. Have you been able to figure out if this is an RX side or a TX side problem? On the RX side, I have had issues with transceiver DFE getting into a strange state when it comes out of reset with no valid input signal present. Xilinx says you have to reset the transceiver after applying a signal, but doesn't provide any guidance on how to determine when a signal is present or not. One thing on my to-do list is to write a sort of watchdog that checks the PHY layer signals and kicks the transceiver if it's seeing garbage for a while. However, I have not had time to work on this yet.
Yeah so I wrote some code which toggled the reset a number of times after the mmcm for the reference clock finishes locking, which seemed to help at the time, but the reason I closed this is because it appears that the primary issue in our case was that the script we have that was programming the FPGA was not sleeping for long enough before attempting TX/RX transactions, thus the inconsistency. Once we let it sit for a second, we are no longer seeing these issues in the 20-30 tests we've run since. If it pops back up, I'll definitely post an update.
I am still seeing a problem which probably needs to go under a different issue. We have a 1G link that we are using that goes through this ethernet phy, which is similar to one that one of your example designs uses: https://www.ti.com/lit/ds/symlink/dp83867ir.pdf?ts=1595533469891&ref_url=https%253A%252F%252Fwww.ti.com%252Fproduct%252FDP83867IR. It connects via RGMII which we've set up using one of your examples, and we've used the MDIO bus to program the phy to turn off SGMII and configured RGMII to be enabled. This is mostly working for us. What we're seeing is that something like 10% of the time, loopback requests sent from the server will not make it out of the eth_mac_1g_rgmii. I have a hard time capturing anything before that since the network has different broadcast packets, so setting up a capture has proven difficult/impossible. Primarily I wanted to check with you what the eth_mac_1g_rgmii expects as far as clock-to-data offset when you set the USE_CLK90 to FALSE. Does the receive side expect to be receiving data with the clock offset by 90 degrees per the usual RGMII standard? And have you ever seen an issue like this that wasn't related to the clock-to-data offset?
Thanks for your time
USE_CLK90 only affects the clock configuration on the TX side. Basically it determines if the forwarded TX clock is sourced from clk or clk90. It has no other effects. I haven't been able to make it work without using the separate 90 degree TX clock, but presumably if you set up the ODELAY sites correctly it should work without the extra 90 degree clock. On the receive side, you may need to twiddle the IDELAY settings to get the link to work correctly. The receive side does require sufficient setup and hold to capture the data, if the PHY doesn't provide a phase offset then you'll need to use IDELAY to do that. There should also be a TCL script to generate bit files with different IDELAY tap settings without having to rerun the whole toolchain. I recommend walking the IDELAY settings forward until you see packet loss, then backward until you see packet loss, then split the difference and put that value in the HDL.
I've been experiencing issues with a modified version of the VCU108/VCU118 designs (all of the modifications were made after the udp_complete module to add packet types, nothing about the PCS/PMA or MAC layers was modified significantly), which is that about 50% of the time after re-programming the FPGA the ethernet links will fail or be extremely inconsistent. The other 50% of the time it comes up and there are no issues, regardless of how many packets we send through the link. These problems can be fixed via a push-button reset, which leads me to believe that it may be the result of the GTYs which Xilinx documentation suggests are supposed to undergo a particular reset sequence after programming.
Have you had any issues like this, and if so what was your solution?