enjoy-digital / liteeth

Small footprint and configurable Ethernet core
205 stars 84 forks source link

ECP5 / RGMII doesn't meet timing closure #27

Open ximinity opened 4 years ago

ximinity commented 4 years ago

The README file mentions the following:

3. Check out /examples/versa_ecp5_udp_loopback for a good practical example of how to get
started with the Liteeth core solo in an FPGA.

I've tested the example in examples/targets/udp_loopback by performing the following steps:

  1. Build example bitstream:

$ ./versa_ecp5.py

Full output: timing.txt Note that timing closure is not met for crg_clkout. Versions: yosys: 3c41599ee1f62e4d77ba630fa1a245ef3fe236fa nextpnr: 247e18cf027334d5201be00735aa607250e6253d trellis: e2e10bfdfaa29fed5d19e83dc7460be9880f5af4

  1. Load bitstream to FPGA

$ ./versa_ecp5.py load

  1. Set local ip:
    $ sudo ifconfig enp7s0 netmask

$ ifconfig enp7s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask broadcast ether e8:6a:64:c7:84:3b txqueuelen 1000 (Ethernet) RX packets 1699 bytes 1050972 (1.0 MiB) RX errors 0 dropped 529 overruns 0 frame 0 TX packets 1641 bytes 119511 (116.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 33

4. Set ARP entry:

$ sudo arp -s 10:e2:d5:00:00:00 -i enp7s0 $ arp -a
? ( at 10:e2:d5:00:00:00 [ether] PERM on enp7s0

5. Ping to FPGA:

$ ping PING ( 56(84) bytes of data.

This gives no response.

6. Run listener and sender:
(Note: when the UDP example was added in #23 the `listener.py` and `sender.py` have had their IP addresses switched from the original example, the current version is the wrong way around)

$ sudo ./listener.py & ./sender.py [4] 9009 2020-01-25 20:58:29 2020-01-25 20:58:30 2020-01-25 20:58:30 2020-01-25 20:58:31 2020-01-25 20:58:31 2020-01-25 20:58:32

This also gives no response from the FPGA.
See the following wireshark trace: ![wireshark](https://user-images.githubusercontent.com/36485856/73126686-0e049300-3fb6-11ea-9101-6395bed8753a.png)

On a final note, both PHY status indicators of the Ethernet interface on the FPGA turn off as soon as I connect the network cable (the orange link state is turned on when the cable is unconnected).
shuffle2 commented 4 years ago

There are multiple culprits (at least when using lattice toolchain and LFE5UM-45F(non-5G) Versa, which is what I've been doing) causing timing problems.

  1. There are a few registers in the IP path which are initialized to a non-zero value. In diamond you'll notice it complain that these registers cannot be packed efficiently into slices. This applies to some of the checksum modules, as well as some counter values which count down instead of up (should be easy to invert).
  2. LiteEthIPV4Checksum results in the longest critical path. As a hack I've locally just replaced this module with a nop that always has done = 1 and value = 0.
  3. I noticed trellis has poor results. Maybe try diamond to compare/debug.

On a final note, both PHY status indicators of the Ethernet interface on the FPGA turn off as soon as I connect the network cable (the orange link state is turned on when the cable is unconnected).

This is normal default behavior on my board as well. The link seems fine, perhaps Lattice have designed the board in a strange way (it wouldn't be the first part... :) ). Indeed the versa user guide schematic around PHY1_LED seems a bit strange... See also this pdf for more info: marvell-phys-transceivers-alaska-88e151x-datasheet-2018-02.pdf

shuffle2 commented 4 years ago

Another hacky way I'm using to make timing closure easier is to disable icmp (with_icmp=False to LiteEthUDPIPCore).

enjoy-digital commented 3 years ago

The examples has been removed. The current issue is already tracked here: https://github.com/litex-hub/litex-boards/issues/40, we'll try to improve this soon.

rowanG077 commented 1 year ago

@enjoy-digital This can be closed. With the 32-bit + buffered CDC there are no timing issues anymore.

ozel commented 10 months ago

I still saw this issue with $glbnet$eth_clocks_rx$TRELLIS_IO_IN failing well below 100 MHz after all those improvements and even on a Butterstick board using ECP5 with speed grade 8. It took some time until I realised that the 'data_width' of the etherbone module is by default still 8 instead of 32. ECP5 examples in 'bench' should probably be updated unless there are other side effects of using a wider Ethernet/Etherbone data_width by default.

anyway, thank you all for fixing this!

rowanG077 commented 10 months ago

@ozel Which version and config are you using? I get pretty stable timing closure with the current liteeth with this config:

phy_tx_delay: 0e-9
phy_rx_delay: 2e-9
device: LFE5U-25F-6BG256C
vendor: lattice
toolchain: trellis
# Core -------------------------------------------------------------------------
clk_freq: 125e6
core: udp

mac_address: 0x10e2d5000000

tx_cdc_depth: 16
tx_cdc_buffered: True
rx_cdc_depth: 16
rx_cdc_buffered: True

    data_width: 32
    mode: raw

The CDC parameters are important to get better timing

ozel commented 10 months ago

@rowanG077 I meant the test bench folder projects, for example https://github.com/enjoy-digital/liteeth/blob/master/bench/butterstick.py

rowanG077 commented 10 months ago

I see. Configuration on that level is not supported for the benchcore it seems. You could copy paste the add_etherbone method defined in: https://github.com/enjoy-digital/litex/blob/master/litex/soc/integration/soc.py#L1766

And change the config there so it passes timing.

ozel commented 10 months ago

32-bit data width can be changed from the 8-bit default, this works:

self.add_etherbone(phy=self.ethphy, buffer_depth=255, data_width=32)

I just wanted to highlight that current ECP5 test bench examples still report timing violations unless modified. There might be good reasons for that of course, but newcomers to LiteX might wonder what's going on...

TheZoq2 commented 2 months ago

@ozel Which version and config are you using? I get pretty stable timing closure with the current liteeth with this config:

phy_tx_delay: 0e-9
phy_rx_delay: 2e-9
device: LFE5U-25F-6BG256C
vendor: lattice
toolchain: trellis
# Core -------------------------------------------------------------------------
clk_freq: 125e6
core: udp

mac_address: 0x10e2d5000000

tx_cdc_depth: 16
tx_cdc_buffered: True
rx_cdc_depth: 16
rx_cdc_buffered: True

    data_width: 32
    mode: raw

The CDC parameters are important to get better timing

I'm having trouble reaching 125 MHz even with this config on a butterstick (LFE5UM5G-85F), best I can get to is about 104 MHz

rowanG077 commented 2 months ago

Are you sure you have the most recent liteeth? What is your system clock frequency? I'm using liteeth on a ECP5 part that is the lowest speed grade. The ECP5 on the butterstuck has the highest speed grade. It should have no trouble reaching 125Mhz. I just looked at just the logic timing between the two. Your part is almost twice faster. Are you using 125Mhz for your system clock?

TheZoq2 commented 2 months ago

I'm on liteeth downloaded today, so that should be no problem, but I probably misunderstood the clocking. Since I set core_freq to 125 MHz I figured I should drive sys_clk at 125 MHz, but I guess that's not the case?

rowanG077 commented 2 months ago

sys_clk can be any clock you want. What is import is that you use the RGMII clock for the rx clock. I run liteeth on 50Mhz. That's the reason why you go to a higher bitwidth. You process multiple bytes in a single cycle so you can lower your clock frequency.

TheZoq2 commented 2 months ago

Excellent, thanks for the quick reply!