Closed Dolu1990 closed 3 months ago
It seems that the clocker_storage combinatorialy drive the litex main internal memory bus read data, through quite some logic layers aswell :
This may be related to the timing issue, puting a lot of logic placement constraints on the clocker_storage to be "in the middle of them all, far from them all"
Thanks @Dolu1990, I'll try to look at it soon.
thanks ^^
Maybe the issue is more that the ODDR are combinatorialy connected to clocker_storage, and as the ODDR has a fixed location => long path
Hi @Dolu1990,
I tried to do some timings improvements to LiteSDCard in this branch: https://github.com/enjoy-digital/litesdcard/commits/timing_improvements. I tested the build and no longer seems to see this path in the critical paths. This could be worth testing on your end. If this improve things, I'll try to integrate this properly.
Hi @Dolu1990, @enjoy-digital,
On Tue, Nov 07, 2023 at 07:57:28AM -0800, enjoy-digital wrote:
I tried to do some timings improvements to LiteSDCard in this branch: https:// github.com/enjoy-digital/litesdcard/commits/timing_improvements. I tested the build and no longer seems to see this path in the critical paths. This could be worth testing on your end. If this improve things, I'll try to integrate this properly.
Nice -- I've been meaning to propose another iteration of changes to the linux driver based on @paulusmack's suggestions, but will hold off until the dust settles on the gateware (which works out great given my current $dayjob and class workload :)
Hi @gsomlo,
the changes done here do not affect the interface/use of the core, so this should not affect the linux driver changes.
Thanks :D I will test the changes ASAP (including the gbuf change)
3c9b2410 is fine, debian run happily 8f15cbef (phy: Add buffers on SDPHYDATAW/R datapaths to improve timings. ) seems to break stuff, producing some instability on the sdcard access, which start to show up once in the kernel :
it isn't a 100% occurence bug.
wanna my to try parts of the commit 8f15cbef ?
Thanks for the test. IIRC, the initial timing issue was on the CRC Inserter, before the PHY. So you can try to only keep the additional buffer on SDPHYDATAW and if still not working, only keep pipe_ready=True
on it. (Set pipe_valid
to False).
Just tested. Even with SDPHYDATAW pipe_ready= true (and the 3 other off) it create the issue.
About timings, it seems all good (without SDPHYDATAW / SDPHYDATAR buffers) (note i'm testing with the gbuf of https://github.com/enjoy-digital/litex/issues/1813#issuecomment-1798399902)
So i guess we can revert "phy: Add buffers on SDPHYDATAW/R datapaths to improve timings." and that's good to go.
Thanks :)
@Dolu1990: Good, I'll do that yes then.
Hi @enjoy-digital,
Apparently, with a different SoC (VexiiRiscv), the issue is back in buisness XD
I updated the vexiiriscv PR (https://github.com/enjoy-digital/litex/pull/1923).
It should be reproducable via :
python3 -m litex_boards.targets.digilent_nexys_video --cpu-type=vexiiriscv --with-jtag-tap --bus-standard axi-lite --vexii-args=" \
--allow-bypass-from=0 --debug-privileged --with-mul --with-div --div-ipc --with-rva --with-supervisor --performance-counters 0 \
--regfile-async --xlen=64 --with-rvc --with-rvf --with-rvd --fma-reduced-accuracy \
--fetch-l1 --fetch-l1-ways=4 --fetch-l1-mem-data-width-min=64 \
--lsu-l1 --lsu-l1-ways=4 --lsu-l1-mem-data-width-min=64 --lsu-l1-store-buffer-ops=32 --lsu-l1-refill-count 2 --lsu-l1-writeback-count 2 --lsu-l1-store-buffer-slots=2 --with-lsu-bypass \
--with-btb --with-ras --with-gshare --relaxed-branch" --cpu-count=2 --with-jtag-tap --with-video-framebuffer --with-sdcard --with-ethernet --with-coherent-dma --l2-byte=131072 --update-repo=no --sys-clk-freq 100000000 --build --load
here are the vexii netlist from pythondata-cpu-vexiiriscv/pythondata_cpu_vexiiriscv/verilog verilog.zip
Here is one timeing violation :
Slack (VIOLATED) : -0.974ns (required time - arrival time)
Source: clocker_storage_reg[3]/C
(rising edge-triggered cell FDRE clocked by crg_s7mmcm0_clkout0 {rise@0.000ns fall@5.000ns period=10.000ns})
Destination: ODDR_32/D2
(rising edge-triggered cell ODDR clocked by crg_s7mmcm0_clkout0 {rise@0.000ns fall@5.000ns period=10.000ns})
Path Group: crg_s7mmcm0_clkout0
Path Type: Setup (Max at Slow Process Corner)
Requirement: 10.000ns (crg_s7mmcm0_clkout0 rise@10.000ns - crg_s7mmcm0_clkout0 rise@0.000ns)
Data Path Delay: 10.191ns (logic 1.820ns (17.859%) route 8.371ns (82.141%))
Logic Levels: 11 (LUT2=2 LUT4=2 LUT5=1 LUT6=6)
Clock Path Skew: 0.117ns (DCD - SCD + CPR)
Destination Clock Delay (DCD): 6.171ns = ( 16.171 - 10.000 )
Source Clock Delay (SCD): 6.371ns
Clock Pessimism Removal (CPR): 0.317ns
Clock Uncertainty: 0.067ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Discrete Jitter (DJ): 0.114ns
Phase Error (PE): 0.000ns
Location Delay type Incr(ns) Path(ns) Netlist Resource(s)
------------------------------------------------------------------- -------------------
(clock crg_s7mmcm0_clkout0 rise edge)
0.000 0.000 r
R4 0.000 0.000 r clk100 (IN)
net (fo=0) 0.000 0.000 clk100
R4 IBUF (Prop_ibuf_I_O) 1.475 1.475 r clk100_IBUF_inst/O
net (fo=11, routed) 1.233 2.708 crg_s7mmcm0_clkin
MMCME2_ADV_X1Y2 MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT0)
0.088 2.796 r MMCME2_ADV/CLKOUT0
net (fo=1, routed) 1.808 4.605 crg_s7mmcm0_clkout0
BUFGCTRL_X0Y0 BUFG (Prop_bufg_I_O) 0.096 4.701 r BUFG/O
net (fo=30973, routed) 1.670 6.371 sys_clk
SLICE_X57Y146 FDRE r clocker_storage_reg[3]/C
------------------------------------------------------------------- -------------------
SLICE_X57Y146 FDRE (Prop_fdre_C_Q) 0.456 6.827 f clocker_storage_reg[3]/Q
net (fo=5, routed) 0.641 7.468 clocker_storage[3]
SLICE_X57Y146 LUT4 (Prop_lut4_I3_O) 0.124 7.592 f BUFG_10_i_14/O
net (fo=5, routed) 0.613 8.205 BUFG_10_i_14_n_0
SLICE_X53Y146 LUT6 (Prop_lut6_I0_O) 0.124 8.329 r BUFG_10_i_9/O
net (fo=2, routed) 0.446 8.774 BUFG_10_i_9_n_0
SLICE_X53Y146 LUT6 (Prop_lut6_I0_O) 0.124 8.898 r BUFG_10_i_6/O
net (fo=2, routed) 0.954 9.853 BUFG_10_i_6_n_0
SLICE_X50Y146 LUT6 (Prop_lut6_I3_O) 0.124 9.977 r BUFG_10_i_1_comp_1/O
net (fo=23, routed) 0.664 10.640 I0
SLICE_X45Y144 LUT2 (Prop_lut2_I0_O) 0.124 10.764 f sdcard_core_crc16_inserter_count[2]_i_5/O
net (fo=2, routed) 1.482 12.246 sdcard_core_crc16_inserter_count[2]_i_5_n_0
SLICE_X17Y135 LUT6 (Prop_lut6_I4_O) 0.124 12.370 r storage_27_reg_i_15_comp_1/O
net (fo=4, routed) 0.323 12.692 storage_27_reg_i_15_n_0
SLICE_X17Y133 LUT2 (Prop_lut2_I0_O) 0.124 12.816 r sdcard_mem2block_count[8]_i_1/O
net (fo=110, routed) 0.474 13.290 sdcard_mem2block_count[8]_i_1_n_0
SLICE_X16Y132 LUT5 (Prop_lut5_I4_O) 0.124 13.414 f ODDR_32_i_10/O
net (fo=1, routed) 1.695 15.109 ODDR_32_i_10_n_0
SLICE_X0Y113 LUT6 (Prop_lut6_I1_O) 0.124 15.233 f ODDR_32_i_5/O
net (fo=1, routed) 0.162 15.395 ODDR_32_i_5_n_0
SLICE_X0Y113 LUT6 (Prop_lut6_I2_O) 0.124 15.519 f ODDR_32_i_2/O
net (fo=1, routed) 0.403 15.922 ODDR_32_i_2_n_0
SLICE_X1Y113 LUT4 (Prop_lut4_I1_O) 0.124 16.046 r ODDR_32_i_1/O
net (fo=2, routed) 0.515 16.562 sdpads_data_o[3]
OLOGIC_X0Y113 ODDR r ODDR_32/D2
------------------------------------------------------------------- -------------------
(clock crg_s7mmcm0_clkout0 rise edge)
10.000 10.000 r
R4 0.000 10.000 r clk100 (IN)
net (fo=0) 0.000 10.000 clk100
R4 IBUF (Prop_ibuf_I_O) 1.405 11.405 r clk100_IBUF_inst/O
net (fo=11, routed) 1.162 12.567 crg_s7mmcm0_clkin
MMCME2_ADV_X1Y2 MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT0)
0.083 12.650 r MMCME2_ADV/CLKOUT0
net (fo=1, routed) 1.723 14.373 crg_s7mmcm0_clkout0
BUFGCTRL_X0Y0 BUFG (Prop_bufg_I_O) 0.091 14.464 r BUFG/O
net (fo=30973, routed) 1.708 16.171 sys_clk
OLOGIC_X0Y113 ODDR r ODDR_32/C
clock pessimism 0.317 16.488
clock uncertainty -0.067 16.421
OLOGIC_X0Y113 ODDR (Setup_oddr_C_D2) -0.834 15.587 ODDR_32
-------------------------------------------------------------------
required time 15.587
arrival time -16.562
-------------------------------------------------------------------
slack -0.974
Thanks for the incantation @Dolu1990 :), I'll try to fix that tomorrow.
@Dolu1990: Sorry for the delay, this should be better with https://github.com/enjoy-digital/litesdcard/commit/15bc2db36607c2cebcf21f37b8c3e496a0c8fb68, if not or not enough, please tell me, I'll improve.
The SDPHYClocker as also been simplified with https://github.com/enjoy-digital/litesdcard/commit/1a3d47466552169caecbd0c5757d7eae8c28e0cb.
I tested on VexiiRiscv + debian, all seems good. Also, the timing issue seems to have been pushed away.
Thanks :D
Great! Thanks for the feedback.
Hi,
Using python3 -m litex_boards.targets.digilent_nexys_video --cpu-type=naxriscv --bus-standard axi-lite --with-video-framebuffer --with-coherent-dma --with-sdcard --with-ethernet --xlen=64 --scala-args='rvc=true,rvf=true,rvd=true,alu-count=2,decode-count=2' --with-jtag-tap --sys-clk-freq 100000000 --cpu-count 2 --soc-json build/digilent_nexys_video/csr.json --update-repo no --load
(FPGA half full, 60K lut) One timing issue I often hit is the following :
(the schematic show the path going through nas, but it isn't, probably some hierarchy flattening hallucinations ^^)
I'm not sure what to do about that path, it seems related to the sdcard. (ODDR goes to the sdcard)
Note, when the FPGA is less full, the timing is ok.
Regards Dolu