Closed cklarhorst closed 3 years ago
@cklarhorst: would you mind sharing a minimal design to have a look at it? Just to be sure to analyze things on a similar design.
Sure, I will try to create a minimal example
Modified minispartan6 target: minispartan6.py.txt
I changed:
Results:
trce -e 5 minispartan6.ncd
:
Slack: -2.842ns (requirement - (data path - clock path skew + uncertainty))
Source: Mram_storage_313/DP (RAM)
Destination: trigger_mem_graycounter1_q_3 (FF)
Requirement: 1.562ns
Data Path Delay: 3.833ns (Levels of Logic = 4)(Component delays alone exceeds constraint)
Clock Path Skew: -0.339ns (1.489 - 1.828)
Source Clock: clkout_buf0 rising at 0.000ns
Destination Clock: clkout_buf2 rising at 1.562ns
Clock Uncertainty: 0.232ns
Clock Uncertainty: 0.232ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
Total System Jitter (TSJ): 0.070ns
Discrete Jitter (DJ): 0.211ns
Phase Error (PE): 0.120ns
Maximum Data Path at Slow Process Corner: Mram_storage_313/DP to trigger_mem_graycounter1_q_3
Location Delay type Delay(ns) Physical Resource
Logical Resource(s)
------------------------------------------------- -------------------
SLICE_X34Y59.B Tshcko 0.885 trigger_mem_asyncfifo_dout<12>
Mram_storage_313/DP
SLICE_X39Y61.B6 net (fanout=4) 0.548 trigger_mem_asyncfifo_dout<12>
SLICE_X39Y61.B Tilo 0.259 xilinxmultiregimpl4_regs0<3>
trigger_mem_graycounter1_ce1131
SLICE_X38Y60.A4 net (fanout=1) 0.436 trigger_mem_graycounter1_ce113
SLICE_X38Y60.A Tilo 0.203 trigger_mem_graycounter1_ce111
trigger_mem_graycounter1_ce7
SLICE_X38Y59.B4 net (fanout=1) 0.430 trigger_mem_graycounter1_ce7
SLICE_X38Y59.B Tilo 0.203 trigger_mem_graycounter1_q<4>
trigger_mem_graycounter1_ce8
SLICE_X37Y60.C4 net (fanout=9) 0.547 trigger_mem_graycounter1_ce8
SLICE_X37Y60.CLK Tas 0.322 trigger_mem_graycounter1_q<3>
Mxor_trigger_mem_graycounter1_q_next_3_xo<0>1
trigger_mem_graycounter1_q_3
------------------------------------------------- ---------------------------
Total 3.833ns (1.872ns logic, 1.961ns route)
(48.8% logic, 51.2% route)
So this time it is the other AsyncFifo (the one in the trigger). For me it seems that both AsyncFifos have that problem. I would expect that the dout of the fifo should already be clocked by clkout_buf2
in this case.
Synthesis log:
INFO:Xst:3231 - The small RAM <Mram_storage_3> will be implemented on LUTs in order to maximize performance and save block RAM resources. If you want to force its implementation on block, use option/constraint ram_style.
-----------------------------------------------------------------------
| ram_type | Distributed | |
-----------------------------------------------------------------------
| Port A |
| aspect ratio | 16-word x 22-bit | |
| clkA | connected to signal <clkout_buf0> | rise |
| weA | connected to signal <trigger_mem_graycounter0_ce> | high |
| addrA | connected to signal <trigger_mem_graycounter0_q_binary<3:0>> | |
| diA | connected to signal <("00",trigger_mem_value_storage,trigger_mem_mask_storage)> | |
-----------------------------------------------------------------------
| Port B |
| aspect ratio | 16-word x 22-bit | |
| addrB | connected to signal <memadr_3> | |
| doB | connected to signal <trigger_mem_asyncfifo_dout> | |
So no clk on Port B.
The relevant verilog part for the asyncfifo looks like:
reg [21:0] storage_3[0:15];
reg [3:0] memadr_2;
reg [3:0] memadr_3;
always @(posedge sys_clk) begin
if (trigger_mem_wrport_we)
storage_3[trigger_mem_wrport_adr] <= trigger_mem_wrport_dat_w;
memadr_2 <= trigger_mem_wrport_adr;
end
always @(posedge scope_clk) begin
memadr_3 <= trigger_mem_rdport_adr;
end
assign trigger_mem_wrport_dat_r = storage_3[memadr_2];
assign trigger_mem_rdport_dat_r = storage_3[memadr_3];
so maybe the assign trigger_mem_rdport_dat_r = storage_3[memadr_3];
made the synthesis tool think that this equals a Dual-Port RAM with Asynchronous Read?
I just found out that changing the read & write port mode to READ_FIRST
yield the expected behavior (migen/genlib/fifo.py:229):
rdport = storage.get_port(clock_domain="read", mode=READ_FIRST)
and
wrport = storage.get_port(write_capable=True, clock_domain="write", mode=READ_FIRST)
results in:
reg [21:0] storage_3[0:15];
reg [21:0] memdat_5;
reg [21:0] memdat_6;
always @(posedge sys_clk) begin
if (trigger_mem_wrport_we)
storage_3[trigger_mem_wrport_adr] <= trigger_mem_wrport_dat_w;
memdat_5 <= storage_3[trigger_mem_wrport_adr];
end
always @(posedge scope_clk) begin
memdat_6 <= storage_3[trigger_mem_rdport_adr];
end
assign trigger_mem_wrport_dat_r = memdat_5;
assign trigger_mem_rdport_dat_r = memdat_6;
But it would increase latency by one.
Update: I have created a smaller example: minispartan6.py.txt
It contains a small module:
class TestModule(Module, AutoCSR):
def __init__(self):
self.counter = counter = Signal(10)
self.sync.sys2x_ps += counter.eq(counter + 1)
self.counterCSR = CSRStatus(10)
mem = stream.AsyncFIFO([("data", 10)], 4)
mem = ClockDomainsRenamer({"write": "sys2x_ps", "read": "sys"})(mem)
self.submodules += mem
self.comb += [
mem.source.ready.eq(self.counterCSR.we),self.counterCSR.status.eq(mem.source.data),
mem.sink.valid.eq(1),
mem.sink.data.eq(counter)
]
Architecture: Fast Counter -> AsyncFIFO -> CSR And this already creates a timing problem with ISE.
Now I think that there might be a false path in the AsyncFIFO. I found out that the vivado backend would add an attribute to one of the involved FF (mr_ff) and the ISE backend doesn't do that. In fact the ISE backend doesn't create any false path constraints and also no async_reg
attributes.
I'm not sure if I'm now really on the right track here btw.
@cklarhorst: a false path should indeed be applied on the gray counters of the AsyncFIFO. The false paths are not applied automatically in LiteScope and you have to use Platform.add_false_path_constraint
in your design (or Platform.add_platform_command
).
Regarding the automatic false paths:
Vivado: I think they are automatically created because all MultiRegs get a special annotation here and afterwards here there is a special xdc line added that just marks all those paths as false paths.
Altera: This backend uses a more global method, it just marks all paths interacting between clocks as false paths here.
Others: I didn't found anything about those automatic false paths for the lattice, microsemi and ise backends.
So if I understood everything correctly I would say that I like that vivado approach the most because I would expect that it breaks early if someone misses a MultiReg.
Sadly, I think I'm not able to backport that solution for ISE because it uses get_nets
to find all nets but in ISE the UCF file doesn't allow filtering nets by attributes.
For testing, I just added false path constraints to the minispartan6 by hand:
platform.add_platform_command("TIMESPEC TS_A1=TO FFS(xilinxmultiregimpl0_regs0) TIG;")
platform.add_platform_command("TIMESPEC TS_A2=TO FFS(xilinxmultiregimpl1_regs0*) TIG;")
platform.add_platform_command("TIMESPEC TS_A3=TO FFS(xilinxmultiregimpl2_regs0*) TIG;")
and it was working.
@enjoy-digital Isn't it possible to somehow scan through the design (at the time when the ucf file is generated here) and find all MultiRegs? Then I would only need to figure out how they get named in the verilog file and I would be able to call add_platform_command and create a false path for each MultiReg.
Thanks for your help and sorry for the long post.
Hi again and sorry for the long delay.
I wrote a short function for the ISE backend that scans through the generated verilog code in order to call add_platform_command
(to create a false path constraint) for each found mr_ff
annotation.
Additionaly, I found that post which says
ISE and Diamond do not consider all clocks related by default. So, there is no need to automatically define false path constraints in these toolchains; although one might need to manually add false path constraints for clocks that go through PLLs, etc.
I know think that I stumbled upon that special case because I used LiteScope (and the AsyncFifo) class to analyze signals in different clocking regions (possible created through PLLs).
Function:
def gen_constraints_from_verilog(self, verilog, platform):
import re
mr_ff_pattern = re.compile('.*mr_ff.*reg (?:\[.*\] )?(\S+)')
unique_id = 0
for line in str(verilog).splitlines():
try:
reg_name = mr_ff_pattern.search(line).group(1)
platform.add_platform_command("TIMESPEC TS_FALSE%d=TO FFS(%s*) TIG;" % (unique_id, reg_name))
unique_id += 1
except AttributeError:
pass
@enjoy-digital Do you want to have a PR for this? (I think scanning through the generated source might not be the most elegant solution. Additionally, there doesn't seem to be many people affected by this?) Otherwise, I would like to close this issue.
There is an AsyncFifo in the storage class (core.py:176):
cdc = stream.AsyncFIFO([("data", data_width)], 4)
which I think is responsible to hand over the data from the scope domain into the sys clock domain.As expected the code includes a ClockDomainRenamer (core:177):
cdc = ClockDomainsRenamer({"write": "scope", "read": "sys"})(cdc)
but to me it looks like the output of the fifo is still wrongly clocked (in the final design).Why?
I think the cdc_asyncfifo_dout often appears in the critical path (for a very fast scope clk)
The synthesis log shows:
==> I would expect that Port B should have clkB = [sys clk]
The generated verilog code shows:
always @(posedge sys_clk) begin memadr_6 <= storage_cdc_rdport_adr; end
assign storage_cdc_wrport_dat_r = storage_8[memadr_5]; assign storage_cdc_rdport_dat_r = storage_8[memadr_6]; [cut] assign storage_cdc_asyncfifo_dout = storage_cdc_rdport_dat_r; [cut] storage_mem_data_status <= storage_cdc_source_payload_data;
self.comb += [ self.mem_valid.status.eq(cdc.source.valid), cdc.source.ready.eq(self.mem_data.we | ~self.enable.storage), self.mem_data.status.eq(cdc.source.data) ]
self.sync += [ self.mem_valid.status.eq(cdc.source.valid), cdc.source.ready.eq(self.mem_data.we | ~self.enable.storage), self.mem_data.status.eq(cdc.source.data) ]