enjoy-digital / litex

Build your hardware, easily!
Other
2.87k stars 552 forks source link

clocker_storage_reg -> ODDR timing issue #1814

Closed Dolu1990 closed 3 months ago

Dolu1990 commented 10 months ago

Hi,

Using python3 -m litex_boards.targets.digilent_nexys_video --cpu-type=naxriscv --bus-standard axi-lite --with-video-framebuffer --with-coherent-dma --with-sdcard --with-ethernet --xlen=64 --scala-args='rvc=true,rvf=true,rvd=true,alu-count=2,decode-count=2' --with-jtag-tap --sys-clk-freq 100000000 --cpu-count 2 --soc-json build/digilent_nexys_video/csr.json --update-repo no --load

(FPGA half full, 60K lut) One timing issue I often hit is the following : image image

(the schematic show the path going through nas, but it isn't, probably some hierarchy flattening hallucinations ^^)

I'm not sure what to do about that path, it seems related to the sdcard. (ODDR goes to the sdcard)

Note, when the FPGA is less full, the timing is ok.

Regards Dolu

Dolu1990 commented 10 months ago

It seems that the clocker_storage combinatorialy drive the litex main internal memory bus read data, through quite some logic layers aswell : image

This may be related to the timing issue, puting a lot of logic placement constraints on the clocker_storage to be "in the middle of them all, far from them all"

enjoy-digital commented 10 months ago

Thanks @Dolu1990, I'll try to look at it soon.

Dolu1990 commented 10 months ago

thanks ^^

Maybe the issue is more that the ODDR are combinatorialy connected to clocker_storage, and as the ODDR has a fixed location => long path

enjoy-digital commented 10 months ago

Hi @Dolu1990,

I tried to do some timings improvements to LiteSDCard in this branch: https://github.com/enjoy-digital/litesdcard/commits/timing_improvements. I tested the build and no longer seems to see this path in the critical paths. This could be worth testing on your end. If this improve things, I'll try to integrate this properly.

gsomlo commented 10 months ago

Hi @Dolu1990, @enjoy-digital,

On Tue, Nov 07, 2023 at 07:57:28AM -0800, enjoy-digital wrote:

I tried to do some timings improvements to LiteSDCard in this branch: https:// github.com/enjoy-digital/litesdcard/commits/timing_improvements. I tested the build and no longer seems to see this path in the critical paths. This could be worth testing on your end. If this improve things, I'll try to integrate this properly.

Nice -- I've been meaning to propose another iteration of changes to the linux driver based on @paulusmack's suggestions, but will hold off until the dust settles on the gateware (which works out great given my current $dayjob and class workload :)

enjoy-digital commented 10 months ago

Hi @gsomlo,

the changes done here do not affect the interface/use of the core, so this should not affect the linux driver changes.

Dolu1990 commented 10 months ago

Thanks :D I will test the changes ASAP (including the gbuf change)

Dolu1990 commented 10 months ago

3c9b2410 is fine, debian run happily 8f15cbef (phy: Add buffers on SDPHYDATAW/R datapaths to improve timings. ) seems to break stuff, producing some instability on the sdcard access, which start to show up once in the kernel : image

it isn't a 100% occurence bug.

wanna my to try parts of the commit 8f15cbef ?

enjoy-digital commented 10 months ago

Thanks for the test. IIRC, the initial timing issue was on the CRC Inserter, before the PHY. So you can try to only keep the additional buffer on SDPHYDATAW and if still not working, only keep pipe_ready=True on it. (Set pipe_valid to False).

Dolu1990 commented 9 months ago

Just tested. Even with SDPHYDATAW pipe_ready= true (and the 3 other off) it create the issue.

About timings, it seems all good (without SDPHYDATAW / SDPHYDATAR buffers) (note i'm testing with the gbuf of https://github.com/enjoy-digital/litex/issues/1813#issuecomment-1798399902)

So i guess we can revert "phy: Add buffers on SDPHYDATAW/R datapaths to improve timings." and that's good to go.

Thanks :)

enjoy-digital commented 9 months ago

@Dolu1990: Good, I'll do that yes then.

enjoy-digital commented 9 months ago

Done with https://github.com/enjoy-digital/litesdcard/commit/181c48b307f6e1fbafec46ea577ac9d04a6173b3, thanks!

Dolu1990 commented 4 months ago

Hi @enjoy-digital,

Apparently, with a different SoC (VexiiRiscv), the issue is back in buisness XD

image image

Dolu1990 commented 4 months ago

I updated the vexiiriscv PR (https://github.com/enjoy-digital/litex/pull/1923).

It should be reproducable via :

python3 -m litex_boards.targets.digilent_nexys_video --cpu-type=vexiiriscv  --with-jtag-tap  --bus-standard axi-lite --vexii-args=" \
--allow-bypass-from=0 --debug-privileged --with-mul --with-div --div-ipc --with-rva --with-supervisor --performance-counters 0 \
--regfile-async --xlen=64 --with-rvc --with-rvf --with-rvd --fma-reduced-accuracy \
--fetch-l1 --fetch-l1-ways=4 --fetch-l1-mem-data-width-min=64 \
--lsu-l1 --lsu-l1-ways=4  --lsu-l1-mem-data-width-min=64 --lsu-l1-store-buffer-ops=32 --lsu-l1-refill-count 2 --lsu-l1-writeback-count 2 --lsu-l1-store-buffer-slots=2  --with-lsu-bypass \
--with-btb --with-ras --with-gshare --relaxed-branch"  --cpu-count=2 --with-jtag-tap  --with-video-framebuffer --with-sdcard --with-ethernet --with-coherent-dma --l2-byte=131072 --update-repo=no  --sys-clk-freq 100000000 --build   --load
Dolu1990 commented 4 months ago

here are the vexii netlist from pythondata-cpu-vexiiriscv/pythondata_cpu_vexiiriscv/verilog verilog.zip

Here is one timeing violation :

Slack (VIOLATED) :        -0.974ns  (required time - arrival time)
  Source:                 clocker_storage_reg[3]/C
                            (rising edge-triggered cell FDRE clocked by crg_s7mmcm0_clkout0  {rise@0.000ns fall@5.000ns period=10.000ns})
  Destination:            ODDR_32/D2
                            (rising edge-triggered cell ODDR clocked by crg_s7mmcm0_clkout0  {rise@0.000ns fall@5.000ns period=10.000ns})
  Path Group:             crg_s7mmcm0_clkout0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            10.000ns  (crg_s7mmcm0_clkout0 rise@10.000ns - crg_s7mmcm0_clkout0 rise@0.000ns)
  Data Path Delay:        10.191ns  (logic 1.820ns (17.859%)  route 8.371ns (82.141%))
  Logic Levels:           11  (LUT2=2 LUT4=2 LUT5=1 LUT6=6)
  Clock Path Skew:        0.117ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    6.171ns = ( 16.171 - 10.000 ) 
    Source Clock Delay      (SCD):    6.371ns
    Clock Pessimism Removal (CPR):    0.317ns
  Clock Uncertainty:      0.067ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.114ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock crg_s7mmcm0_clkout0 rise edge)
                                                      0.000     0.000 r  
    R4                                                0.000     0.000 r  clk100 (IN)
                         net (fo=0)                   0.000     0.000    clk100
    R4                   IBUF (Prop_ibuf_I_O)         1.475     1.475 r  clk100_IBUF_inst/O
                         net (fo=11, routed)          1.233     2.708    crg_s7mmcm0_clkin
    MMCME2_ADV_X1Y2      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT0)
                                                      0.088     2.796 r  MMCME2_ADV/CLKOUT0
                         net (fo=1, routed)           1.808     4.605    crg_s7mmcm0_clkout0
    BUFGCTRL_X0Y0        BUFG (Prop_bufg_I_O)         0.096     4.701 r  BUFG/O
                         net (fo=30973, routed)       1.670     6.371    sys_clk
    SLICE_X57Y146        FDRE                                         r  clocker_storage_reg[3]/C
  -------------------------------------------------------------------    -------------------
    SLICE_X57Y146        FDRE (Prop_fdre_C_Q)         0.456     6.827 f  clocker_storage_reg[3]/Q
                         net (fo=5, routed)           0.641     7.468    clocker_storage[3]
    SLICE_X57Y146        LUT4 (Prop_lut4_I3_O)        0.124     7.592 f  BUFG_10_i_14/O
                         net (fo=5, routed)           0.613     8.205    BUFG_10_i_14_n_0
    SLICE_X53Y146        LUT6 (Prop_lut6_I0_O)        0.124     8.329 r  BUFG_10_i_9/O
                         net (fo=2, routed)           0.446     8.774    BUFG_10_i_9_n_0
    SLICE_X53Y146        LUT6 (Prop_lut6_I0_O)        0.124     8.898 r  BUFG_10_i_6/O
                         net (fo=2, routed)           0.954     9.853    BUFG_10_i_6_n_0
    SLICE_X50Y146        LUT6 (Prop_lut6_I3_O)        0.124     9.977 r  BUFG_10_i_1_comp_1/O
                         net (fo=23, routed)          0.664    10.640    I0
    SLICE_X45Y144        LUT2 (Prop_lut2_I0_O)        0.124    10.764 f  sdcard_core_crc16_inserter_count[2]_i_5/O
                         net (fo=2, routed)           1.482    12.246    sdcard_core_crc16_inserter_count[2]_i_5_n_0
    SLICE_X17Y135        LUT6 (Prop_lut6_I4_O)        0.124    12.370 r  storage_27_reg_i_15_comp_1/O
                         net (fo=4, routed)           0.323    12.692    storage_27_reg_i_15_n_0
    SLICE_X17Y133        LUT2 (Prop_lut2_I0_O)        0.124    12.816 r  sdcard_mem2block_count[8]_i_1/O
                         net (fo=110, routed)         0.474    13.290    sdcard_mem2block_count[8]_i_1_n_0
    SLICE_X16Y132        LUT5 (Prop_lut5_I4_O)        0.124    13.414 f  ODDR_32_i_10/O
                         net (fo=1, routed)           1.695    15.109    ODDR_32_i_10_n_0
    SLICE_X0Y113         LUT6 (Prop_lut6_I1_O)        0.124    15.233 f  ODDR_32_i_5/O
                         net (fo=1, routed)           0.162    15.395    ODDR_32_i_5_n_0
    SLICE_X0Y113         LUT6 (Prop_lut6_I2_O)        0.124    15.519 f  ODDR_32_i_2/O
                         net (fo=1, routed)           0.403    15.922    ODDR_32_i_2_n_0
    SLICE_X1Y113         LUT4 (Prop_lut4_I1_O)        0.124    16.046 r  ODDR_32_i_1/O
                         net (fo=2, routed)           0.515    16.562    sdpads_data_o[3]
    OLOGIC_X0Y113        ODDR                                         r  ODDR_32/D2
  -------------------------------------------------------------------    -------------------

                         (clock crg_s7mmcm0_clkout0 rise edge)
                                                     10.000    10.000 r  
    R4                                                0.000    10.000 r  clk100 (IN)
                         net (fo=0)                   0.000    10.000    clk100
    R4                   IBUF (Prop_ibuf_I_O)         1.405    11.405 r  clk100_IBUF_inst/O
                         net (fo=11, routed)          1.162    12.567    crg_s7mmcm0_clkin
    MMCME2_ADV_X1Y2      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT0)
                                                      0.083    12.650 r  MMCME2_ADV/CLKOUT0
                         net (fo=1, routed)           1.723    14.373    crg_s7mmcm0_clkout0
    BUFGCTRL_X0Y0        BUFG (Prop_bufg_I_O)         0.091    14.464 r  BUFG/O
                         net (fo=30973, routed)       1.708    16.171    sys_clk
    OLOGIC_X0Y113        ODDR                                         r  ODDR_32/C
                         clock pessimism              0.317    16.488    
                         clock uncertainty           -0.067    16.421    
    OLOGIC_X0Y113        ODDR (Setup_oddr_C_D2)      -0.834    15.587    ODDR_32
  -------------------------------------------------------------------
                         required time                         15.587    
                         arrival time                         -16.562    
  -------------------------------------------------------------------
                         slack                                 -0.974    
enjoy-digital commented 4 months ago

Thanks for the incantation @Dolu1990 :), I'll try to fix that tomorrow.

enjoy-digital commented 3 months ago

@Dolu1990: Sorry for the delay, this should be better with https://github.com/enjoy-digital/litesdcard/commit/15bc2db36607c2cebcf21f37b8c3e496a0c8fb68, if not or not enough, please tell me, I'll improve.

enjoy-digital commented 3 months ago

The SDPHYClocker as also been simplified with https://github.com/enjoy-digital/litesdcard/commit/1a3d47466552169caecbd0c5757d7eae8c28e0cb.

Dolu1990 commented 3 months ago

I tested on VexiiRiscv + debian, all seems good. Also, the timing issue seems to have been pushed away.

Thanks :D

enjoy-digital commented 3 months ago

Great! Thanks for the feedback.