enjoy-digital / litex

Build your hardware, easily!
Other
2.87k stars 552 forks source link

Liteeth: Broken constraints #1917

Closed tristanitschner closed 2 months ago

tristanitschner commented 5 months ago

Hi there,

i stumbled upon a severe timing issue with Liteeth while trying to bring up a quad core Naxriscv SoC on the Nexys Video.

In the generated xdc there is:

create_clock -name eth_clocks_rx -period 8.0 [get_ports eth_clocks_rx]
create_clock -name eth_rx_clk -period 8.0 [get_nets eth_rx_clk]

This leads to an invalid clock redefinition on a clock tree. Although the first clock tree is only one net, this does not matter.

eth_rx_data is clocked with eth_rx_clk. I checked bus skew, and it looks good.

Propagation for eth_clocks_rx is eth_clocks_rx -- [through IBUF and BUFG] --> eth_rx_clk

Timing for this path is:

Timing Report

Slack:                    inf
  Source:                 eth_clocks_rx
                            (clock source 'eth_clocks_rx'  {rise@0.000ns fall@4.000ns period=8.000ns})
  Destination:            BUFG_7/I
  Path Group:             (none)
  Path Type:              Max at Slow Process Corner
  Data Path Delay:        3.825ns  (logic 1.369ns (35.782%)  route 2.457ns (64.218%))
  Logic Levels:           1  (IBUF=1)
  Clock Uncertainty:      0.025ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.050ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock eth_clocks_rx fall edge)
                                                      4.000     4.000 f  
    V13                                               0.000     4.000 f  eth_clocks_rx (IN)
                         net (fo=0)                   0.000     4.000    eth_clocks_rx
    V13                                                               f  IBUF/I
    V13                  IBUF (Prop_ibuf_I_O)         1.369     5.369 f  IBUF/O
                         net (fo=1, routed)           2.457     7.825    main_ethphy_eth_rx_clk_ibuf
    BUFGCTRL_X0Y3        BUFG                                         f  BUFG_7/I
  -------------------------------------------------------------------    -------------------

And after the redefinition:

Timing Report

Slack:                    inf
  Source:                 BUFG_7/O
                            (clock source 'eth_rx_clk'  {rise@0.000ns fall@4.000ns period=8.000ns})
  Destination:            PLLE2_ADV/CLKIN1
  Path Group:             (none)
  Path Type:              Max at Slow Process Corner
  Data Path Delay:        2.129ns  (logic 0.000ns (0.000%)  route 2.129ns (100.000%))
  Logic Levels:           0  
  Clock Uncertainty:      0.025ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.050ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock eth_rx_clk fall edge)
                                                      4.000     4.000 f  
    BUFGCTRL_X0Y3        BUFG                         0.000     4.000 f  BUFG_7/O
                         net (fo=122, routed)         2.129     6.129    eth_rx_clk
    PLLE2_ADV_X1Y0       PLLE2_ADV                                    f  PLLE2_ADV/CLKIN1
  -------------------------------------------------------------------    -------------------

So eth_clocks_rx is delayed by ~ 6ns before it becomes eth_rx_clk.

Timing for eth_rx_data looks as configured in IDELAY2 for ~ 2 ns (although I don't understand why it is about 0.1 ns slower):

Timing Report

Slack:                    inf
  Source:                 eth_rx_data[2]
                            (input port)
  Destination:            IDDR_3/D
                            (rising edge-triggered cell IDDR clocked by eth_rx_clk  {rise@0.000ns fall@4.000ns period=8.000ns})
  Path Group:             (none)
  Path Type:              Setup (Max at Slow Process Corner)
  Data Path Delay:        4.694ns  (logic 4.694ns (100.000%)  route 0.000ns (0.000%))
  Logic Levels:           2  (IBUF=1 IDELAYE2=1)
  Clock Path Skew:        1.876ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    1.876ns = ( 5.876 - 4.000 ) 
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.025ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.050ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
    AB15                                              0.000     0.000 r  eth_rx_data[2] (IN)
                         net (fo=0)                   0.000     0.000    eth_rx_data[2]
    AB15                                                              r  IBUF_4/I
    AB15                 IBUF (Prop_ibuf_I_O)         1.385     1.385 r  IBUF_4/O
                         net (fo=1, routed)           0.000     1.385    main_ethphy_liteethphyrgmiirx_rx_data_ibuf_2
    IDELAY_X0Y91                                                      r  IDELAYE2_19/IDATAIN
    IDELAY_X0Y91         IDELAYE2 (Prop_idelaye2_IDATAIN_DATAOUT)
                                                      3.310     4.694 r  IDELAYE2_19/DATAOUT
                         net (fo=1, routed)           0.000     4.694    main_ethphy_liteethphyrgmiirx_rx_data_idelay_2
    ILOGIC_X0Y91         IDDR                                         r  IDDR_3/D
  -------------------------------------------------------------------    -------------------

                         (clock eth_rx_clk fall edge)
                                                      4.000     4.000 f  
    BUFGCTRL_X0Y3        BUFG                         0.000     4.000 f  BUFG_7/O
                         net (fo=122, routed)         1.876     5.876    eth_rx_clk
    ILOGIC_X0Y91         IDDR                                         f  IDDR_3/C

Though this doesn't matter due to clock redefinition. So total delay for data is ~ 4.7 ns.

According to RTL8211E-VL datasheet RXC is synchronous to RXD if RXDLY = 0, which is the case for Nexys Video. So FPGA must account for skew and setup. Skew is ~1.8 ns and setup is ~ 2ns and hold ~ 2ns. So it is best to sample ~3.5 ns after rising edge of rxc.

So eth_clocks_rx should be delayed by 3 to 4 ns.

Looking at the timing reports above, it is apparent that eth_clock_rx is delayed by ~ 3.8 ns, then it become eth_rx_clk, which is again delay by ~ 2.1 ns. eth_rx_data is delayed by ~4.7 ns.

So sampling happens after ~ 0.2 ns.

So it is really off. The result is that the ethmac_rx_datapath_crc_errors register is continually counting up. Transmission is working though, I just can't receive. (Although ethernet had been working in another VexRiscv design, so it's definitely not the phy.)

I might also add, that this effect is "by chance" due to clock redefinition. Given a smaller design, it is less likely for this issue to occur. Vivado just ignores that one path in timing.

Given the complicated clocking architecture of Litex and my inexperience with the framework, I thought it would be best to open an issue.

I suspect that there is similar issue with Litesdcard. I could only get it working with 100 MHz sys clock frequency, but then I can't use NaxRiscv as CPU due to timing. I will investigate further.

tristanitschner commented 5 months ago

Nvm, I'm wrong with what I said above. Exactly the same timing issues are present in the multicore VexRiscv image, and there ethernet works. In as far as the sdcard is concerned, I found out that it only works for 100 MHz or 50 MHz, but not for 75 Mhz, is this intended? But regarding ethernet, I'm still dumbfounded. Maybe it's a driver issue. I'll investigate further.

tristanitschner commented 5 months ago

Ok, it is definitely related to recent crc changes in liteeth. I could reproduce the issue with latest liteeth on VexRiscv. Liteeth release tag 2023.12 works perfectly fine tough, tested with VexRiscv and NaxRiscv SoCs. Finally I can connect to the internet via Debian on quad core NaxRiscv! :)

enjoy-digital commented 4 months ago

Hi @tristantschner,

would you mind sharing your build command? I'll have a look at the timing issue and compare upstream with 2023.12.

tristanitschner commented 4 months ago

The command was:

python3 -m litex_boards.targets.digilent_nexys_video --cpu-type=naxriscv --bus-standard axi-lite \
--with-video-framebuffer --with-coherent-dma --with-spi-sdcard --with-ethernet --xlen=64 \
--scala-args='rvc=true,rvf=true,rvd=true,alu-count=2,decode-count=2' --with-jtag-tap \
--sys-clk-freq 75000000 --cpu-count 4 --soc-json build/digilent_nexys_video/csr.json \
--update-repo no --build

I used Vivado version 2023.2. I also still have the implemented design check point on my disk, if you'd like that.

enjoy-digital commented 2 months ago

Hi @tristanitschner,

pretty sure the regression is the same than the one identified and fixed here: https://github.com/enjoy-digital/liteeth/commit/79ccffcfa748cc7d1fb683e43265bf0c6ebb1d3c and that has been fixed a few days after your initial tests.

I'm closing since confident this is solved, if not please comment or re-open.