Closed MJoergen closed 4 months ago
Here are some observations based on the Implemented Design of this (failing) commit 7ea592b .
First, the 16 nets hr_dq_in[*]
(corresponding to assignment of ctrl_dq_ddr_in
in line 220 in hyperram_io.vhd
) are shown here:
These 16 nets are all Clock Domain Crossings (from hr_rwds
to hr_clk_x1
). I find it troubling that the paths are long and uneven. This should be constrained better. Part of the problem I believe is the set_clock_groups
constraint in common.xdc
lines 30-33, where the clocks hr_rwds
and hr_clk_x1
are deemed asynchronous.
These long uneven paths could probably explain the blinking pixels I see on my machine. However, this alone does not explain the lockup of the HyperRAM controller, because the lockup is not dependent on the data path.
Second, the net hr_rwds_in_delay
is shown here:
This signal (hr_rwds_in_delay
) is used as clock signal for the 8 IDDR blocks. However, it’s also used as clock for the signal hr_toggle
, and also as data input (i.e. Clock Domain Crossing) to the signal ctrl_rwds_in
. One possible hypothetical source of the HyperRAM lockup is if this signal is not sampled correctly.
Specifically, the flip-flop ctrl_rwds_in_reg
is the end of one of the long arrows in the picture above, and hence has a long path too.
The detailed path delay report by Vivado is (note the propagation delay through the IDELAYE2 of 2.474 ns and the final net delay from IDELAYE2 to FDRE : 0.954 ns):
The corresponding path in device view:
@amb5l FYI - please read Michael's "diary" and please feel free to share any thoughts that come to mind.
Thoughts from @amb5l via Skype:
Michael’s reasoning - suspecting the read path - is sound IMO. Perhaps the HyperRAM timing simulation model is limited and doesn’t produce a range of HR_CK to HR_RWDS delays…?
Another random thought: have you looked closely at when the drive direction on RWDS changes? Is any clkx1 synchronous logic completely prevented from seeing RWDS transitions during writes?
To proceed, I first remove the set_clock_groups
constraint (common.xdc
lines 30-33). This gives 1 setup violation (from hr_rwds
to hr_clk_x1
) and 1 hold violation (from hr_rwds
to hr_ck
).
Since the latter hold violation is probably harmless (because the clocks are mostly unrelated), I then add the line:
set_clock_groups -name cg_async -asynchronous -group [get_clocks hr_ck] -group [get_clocks hr_rwds]
This leaves me (in commit 9a6a9a1) with a single setup violation (from hr_rwds
to hr_clk_x1
) of -0.239 ns for the input to ctrl_rwds_in_reg
:
However, notice in the above timing report that the net delay from IDELAYE2 to FDRE is now 3.703 ns. Furthermore, the propagation delay within the IDELAYE2 is now reduced to 2.037 ns. Here, I'm mostly concerned with the increased net delay.
The following picture shows that Vivado deliberately chooses a long round-about route.
This is not desired. I would like the path to be as short as possible.
I initially thought the long path is to satisfy the hold time, but surprisingly, there is a lot of slack on the hold-time requirement:
The 16 nets hr_dq_in[*] (corresponding to assignment of ctrl_dq_ddr_in in line 220 in hyperram_io.vhd) are shown here:
So, they appear to be placed (in this build) somewhat closer to the IDDR buffers.
Timing report for one of these paths is:
From the simulation model (in https://github.com/MJoergen/HyperRAM) I generate the following images showing the signals at the HyperRAM device. In the testbench I'm assuming a 1 ns propagation delay from FPGA to HyperRAM and back.
The first image shows a write:
The second image shows a read:
In the last image, the time from rising edge of hr_ck
to rising edge of hr_rwds
(here 4 ns) is a parameter that can be adjusted in the simulation model.
Looking at the latest commit (9a6a9a):
1) hyperram_io.vhd
contains VHDL blocks. I must confess that I have never seen these before! Does Vivado synthesize them correctly?
2) It appears that the path from the FPGA HyperRAM RWDS pin to the HyperRAM controller system clock domain looks like this:
in hyperram.vhd
:
hr_rwds_in_i
wired to corresponding port of hyperram_io
hyperram_io.vhd
:
-- hr_rwds_in_i
=> IDELAY => hr_rwds_in_delay
-- in process p_async
: hr_rwds_in_delay
=> FD =>ctrl_rwds_in
A single flipflop is not enough to remove the metastability risk here (and the ASYNC_REG
property means that implementation will take no care to avoid it). If HyperRAM read data passes through a FIFO, the FIFO status flags might serve the same purpose as ctrl_rwds_in
with no metastability risk.
3) A very minor point regarding readability of common.xdc
: I think that all instances of 5.0
should be replaced by $tCKHP
, with tCKHP
defined to equal to half the HyperRAM clock period (5.0).@amb5l :
hr_rwds_in_delay
into the flip-flop is stable around the rising edge of the hr_clk_x1
.2. I tried with an XPM FIFO, but that did not work. I found the reason to be that the XPM FIFO requires a free-running clock, but the RWDS signal from the HyperRAM is not free-running. So I'll have to "roll my own" FIFO. However, I thought our timing analysis ensured that setup-and-hold requirements are satisfied on this path. Specifically, that the
hr_rwds_in_delay
into the flip-flop is stable around the rising edge of thehr_clk_x1
.
There are 2 primary drivers of variability in the arrival time of RWDS edges during the read data phase:
clk_x1_del
(phase shifted clk_x1) => hr_ck
(ODDR clock to output)hr_ck
=> hr_rwds
(clock to RWDS inside HyperRAM)I don't think timing analysis can model this; IIRC this is why we decided to pretend that hr_rwds
is asynchronous, up to and including adding a FIFO for the data it is clocking.
I don't see any way for static timing analysis to look at the relationship between hr_rwds_in_delay
and hr_clk_x1
.
I believe that you will need to create a small, shallow FIFO for read data, and an improved CDC circuit to generate ctrl_rwds_in
.
@MJoergen Can we close this one?
Yes. Indeed, I have exactly implemented
I believe that you will need to create a small, shallow FIFO for read data, and an improved CDC circuit to generate ctrl_rwds_in.
as suggested in the previous comment.
@amb5l As you can see above - you again nailed it! Thank you for that :-)
Commit 7ea592b shows no critical warnings and no timing errors. However, the image fails for the GitHub user @Schwefelholz . The symptom is no image / vertical stripes shown on HDMI output, which indicates a lock-up in the HyperRAM controller.
Furthermore, on my own R3 machine I see a different symptom: Blinking pixels on the HDMI output, which indicates random bit errors in the data path.
Additionally, some builds (where unrelated parts are changed) seem to work, and others don’t.
Note: I’m still using PBLOCK for the HyperRAM. When I build without PBLOCK the design sometimes fails with the same symptoms as above.
So, all-in-all I suspect the HyperRAM controller is under-constrained.
My hypothesis is that the problem is in the receive path (from HyperRAM to FPGA) because this is the most complicated (i.e. least understood) part of the HyperRAM controller.
The two main files involved are
hyperram_io.vhd
andcommon.xdc
. These two files combined handle all the Clock Domain Crossings. Outside the filehyperram_io.vhd
everything is synchronuous to the 100 MHzhr_clk_x1
.