ZipCPU / sdspi

SD-Card controller, using either SPI, SDIO, or eMMC interfaces
179 stars 32 forks source link

OPT_SERDES=1 fails implementation #11

Open Ttl opened 1 month ago

Ttl commented 1 month ago

Implementation with OPT_SERDES=1 fails with error "multiple driver nets". The issue is that o_mine is assigned twice in xsdserdes8x.v. After removing the second assigment on line 188 implementation succeeds but timing fails.

Timing report | Name | Slack | Levels | High Fanout | From | To | Total Delay | Logic Delay | Net Delay | Requirement | Source Clock | Destination Clock | Exception | Clock Uncertainty | |----------|-------|--------|-------------|-----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|-------------|-------------|-----------|-------------|-------------------------------|-------------------------------|-----------|-------------------| | Path 221 | -1.01 | 2 | 3 | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.cmd_serdes/GEN_BIDIRECTIONAL.u_iserdes/CLKDIV | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.cmd_serdes/u_oserdes/T1 | 2.23 | 0.74 | 1.49 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 222 | -0.58 | 1 | 3 | ps/sdio_top_0/U0/u_sdio/u_txframe/ck_data_reg[13]/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[5].io_serdes/u_oserdes/T1 | 1.81 | 0.48 | 1.33 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 223 | -0.50 | 1 | 41 | ps/sdio_top_0/U0/u_sdio/u_txframe/ck_valid_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[1].io_serdes/u_oserdes/T1 | 1.73 | 0.63 | 1.10 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 224 | -0.44 | 1 | 41 | ps/sdio_top_0/U0/u_sdio/u_txframe/ck_valid_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[2].io_serdes/u_oserdes/T1 | 1.67 | 0.63 | 1.04 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 225 | -0.41 | 1 | 41 | ps/sdio_top_0/U0/u_sdio/u_txframe/ck_valid_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[4].io_serdes/u_oserdes/T1 | 1.64 | 0.63 | 1.01 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 226 | -0.40 | 1 | 41 | ps/sdio_top_0/U0/u_sdio/u_txframe/ck_valid_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[6].io_serdes/u_oserdes/T1 | 1.63 | 0.63 | 1.00 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 227 | -0.37 | 1 | 41 | ps/sdio_top_0/U0/u_sdio/u_txframe/ck_valid_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[7].io_serdes/u_oserdes/T1 | 1.60 | 0.63 | 0.97 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 228 | -0.30 | 1 | 10 | ps/sdio_top_0/U0/u_sdio/u_control/o_pp_data_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[3].io_serdes/u_oserdes/T1 | 1.53 | 0.54 | 0.99 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 | | Path 229 | -0.29 | 1 | 10 | ps/sdio_top_0/U0/u_sdio/u_control/o_pp_data_reg/C | ps/sdio_top_0/U0/u_sdfrontend/GEN_WIDE_IO.GEN_WIDE_DATIO[0].io_serdes/u_oserdes/T1 | 1.53 | 0.54 | 0.99 | 2.50 | clk_out1_ps_block_clk_wiz_0_0 | clk_out2_ps_block_clk_wiz_0_0 | | 0.21 |

Testing it with failed timing, initialization and CSD reading succeeds but bus width determination fails sometimes. When bus width determination succeeds reading and writing single blocks works with enough time after write. Busy workaround of reading o_debug doesn't work since it's all zeroes when using serdes option.

ZipCPU commented 1 month ago

This issue may need some more investigation, particularly since this line doesn't seem to be appropriately representative at all ... for now, let me ask:

When timing fails, at what (system) clock rate is it failing at?

Also, if/when timing fails, does the design still work for some clock phases? (i.e. via bits [20:16] of the PHY register)

Dan

Ttl commented 1 month ago

System clock is 100 MHz and serdes clock is 400 MHz. Both are generated by the same MMCM. Device is Zynq 7020.

I'll do more testing with serdes when the busy bit is working, but it did seem to be working at least most of the time.

Here is vivado report for the first path in the report with the worst slack:

timing

ZipCPU commented 1 month ago

That appears to be an entirely different path.

ZipCPU commented 1 month ago

I'm now getting similar timing errors here. I haven't yet determined a fix. For now, know that the OPT_DDR=1, OPT_OSERDES=0, front end should be sufficient for any 3.3V uses. If I recall correctly, you need 1.8V support to go any faster.

Ttl commented 1 month ago

I tested OPT_SERDES with the latest changes and it doesn't seem to work. Occasionally bus width determination fails and it falls back to 1 bit bus. When it succeeds it hangs during write. I didn't try to debug it further.

Synthesis fails with errors:

[Synth 8-524] part-select [24:0] out of range of prefix 'ck_sreg' ["sdfrontend.v":819]
[Synth 8-524] part-select [24:0] out of range of prefix 'ck_psreg' ["sdfrontend.v":837]
[Synth 8-524] part-select [24:0] out of range of prefix 'pck_sreg' ["sdfrontend.v":1050]

I changed that to [23:0] in my tests.

ZipCPU commented 1 month ago

I'm working on a fix for this. It's going to involve a bit of front end redesign. Specifically, I'm going to remove all tristate logic from the front end, so the tristate signals are instead registered from within the design before arriving at the front end. This will remove all of the combinatorial tristate logic at the front end. It will also force you into push-pull mode any time you try to send data faster than 1b per clock (i.e. HS200 and HS400).

If this fails, we'll need to place a false path or other false path timing exception on the 8b tristate paths.

These changes aren't going to happen over night however. That said, if you have the patience to stick around for this, I'd love to hear the results you get when testing this--since I don't (currently) have any hardware that can move that fast.

Q1: Will you be able to wait a couple weeks for this feature if necessary?

I'm also staring at the data strobe support, and wondering if it needs updating as well--especially to meet timing in a more reliable manner. Let me ask a second question, then:

Q2: Do you intend to use the return data strobe at all? It's only present with eMMC devices in HS400.

Finally, since you are using an eMMC device, ...

Q3: Do you foresee a need for the collision option in your application?

Collision detection is designed for the case of a GO_IRQ_STATE instruction for the case where you wish to interrupt the device yourself.

Ttl commented 1 month ago

I can test it when it's ready even if it will take a while.

I have return strobe wired to clock capable input pin so I can test it if needed. However, FPGA has adjustable input delay in hardware so I'm not sure if there is any benefit on using strobe instead of tuning the input delay?

I don't see myself needing interrupt support.