enjoy-digital / litescope

Small footprint and configurable embedded FPGA logic analyzer
Other
160 stars 38 forks source link

Question about the Asyncfifo in the storage class #26

Closed cklarhorst closed 3 years ago

cklarhorst commented 3 years ago

There is an AsyncFifo in the storage class (core.py:176): cdc = stream.AsyncFIFO([("data", data_width)], 4) which I think is responsible to hand over the data from the scope domain into the sys clock domain.

As expected the code includes a ClockDomainRenamer (core:177): cdc = ClockDomainsRenamer({"write": "scope", "read": "sys"})(cdc) but to me it looks like the output of the fifo is still wrongly clocked (in the final design).

Why?

always @(posedge sys_clk) begin memadr_6 <= storage_cdc_rdport_adr; end

assign storage_cdc_wrport_dat_r = storage_8[memadr_5]; assign storage_cdc_rdport_dat_r = storage_8[memadr_6]; [cut] assign storage_cdc_asyncfifo_dout = storage_cdc_rdport_dat_r; [cut] storage_mem_data_status <= storage_cdc_source_payload_data;

==> I would expect to see a "<=" in the sys_clk block.

I then tried to change core.py:220 from 

self.comb += [ self.mem_valid.status.eq(cdc.source.valid), cdc.source.ready.eq(self.mem_data.we | ~self.enable.storage), self.mem_data.status.eq(cdc.source.data) ]

into

self.sync += [ self.mem_valid.status.eq(cdc.source.valid), cdc.source.ready.eq(self.mem_data.we | ~self.enable.storage), self.mem_data.status.eq(cdc.source.data) ]


in hope to generate such a sys_clk block around the "<=" but it didn't change anything.

So my questions:
1. Shouldn't self.sync always generate `always @(posedge sys_clk) begin` blocks around the statements?
2. How do I get the output of the fifo clocked from the sys domain?

I hope I provided enough and helpful information.
Thanks in advance for your helpful comments.
enjoy-digital commented 3 years ago

@cklarhorst: would you mind sharing a minimal design to have a look at it? Just to be sure to analyze things on a similar design.

cklarhorst commented 3 years ago

Sure, I will try to create a minimal example

cklarhorst commented 3 years ago

Modified minispartan6 target: minispartan6.py.txt

I changed:

Results:

trce -e 5 minispartan6.ncd:

Slack:                  -2.842ns (requirement - (data path - clock path skew + uncertainty))
  Source:               Mram_storage_313/DP (RAM)
  Destination:          trigger_mem_graycounter1_q_3 (FF)
  Requirement:          1.562ns
  Data Path Delay:      3.833ns (Levels of Logic = 4)(Component delays alone exceeds constraint)
  Clock Path Skew:      -0.339ns (1.489 - 1.828)
  Source Clock:         clkout_buf0 rising at 0.000ns
  Destination Clock:    clkout_buf2 rising at 1.562ns
  Clock Uncertainty:    0.232ns

  Clock Uncertainty:          0.232ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Discrete Jitter (DJ):       0.211ns
    Phase Error (PE):           0.120ns

  Maximum Data Path at Slow Process Corner: Mram_storage_313/DP to trigger_mem_graycounter1_q_3
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X34Y59.B       Tshcko                0.885   trigger_mem_asyncfifo_dout<12>
                                                       Mram_storage_313/DP
    SLICE_X39Y61.B6      net (fanout=4)        0.548   trigger_mem_asyncfifo_dout<12>
    SLICE_X39Y61.B       Tilo                  0.259   xilinxmultiregimpl4_regs0<3>
                                                       trigger_mem_graycounter1_ce1131
    SLICE_X38Y60.A4      net (fanout=1)        0.436   trigger_mem_graycounter1_ce113
    SLICE_X38Y60.A       Tilo                  0.203   trigger_mem_graycounter1_ce111
                                                       trigger_mem_graycounter1_ce7
    SLICE_X38Y59.B4      net (fanout=1)        0.430   trigger_mem_graycounter1_ce7
    SLICE_X38Y59.B       Tilo                  0.203   trigger_mem_graycounter1_q<4>
                                                       trigger_mem_graycounter1_ce8
    SLICE_X37Y60.C4      net (fanout=9)        0.547   trigger_mem_graycounter1_ce8
    SLICE_X37Y60.CLK     Tas                   0.322   trigger_mem_graycounter1_q<3>
                                                       Mxor_trigger_mem_graycounter1_q_next_3_xo<0>1
                                                       trigger_mem_graycounter1_q_3
    -------------------------------------------------  ---------------------------
    Total                                      3.833ns (1.872ns logic, 1.961ns route)
                                                       (48.8% logic, 51.2% route)

So this time it is the other AsyncFifo (the one in the trigger). For me it seems that both AsyncFifos have that problem. I would expect that the dout of the fifo should already be clocked by clkout_buf2 in this case.

Synthesis log:

INFO:Xst:3231 - The small RAM <Mram_storage_3> will be implemented on LUTs in order to maximize performance and save block RAM resources. If you want to force its implementation on block, use option/constraint ram_style.
    -----------------------------------------------------------------------
    | ram_type           | Distributed                         |          |
    -----------------------------------------------------------------------
    | Port A                                                              |
    |     aspect ratio   | 16-word x 22-bit                    |          |
    |     clkA           | connected to signal <clkout_buf0>   | rise     |
    |     weA            | connected to signal <trigger_mem_graycounter0_ce> | high     |
    |     addrA          | connected to signal <trigger_mem_graycounter0_q_binary<3:0>> |          |
    |     diA            | connected to signal <("00",trigger_mem_value_storage,trigger_mem_mask_storage)> |          |
    -----------------------------------------------------------------------
    | Port B                                                              |
    |     aspect ratio   | 16-word x 22-bit                    |          |
    |     addrB          | connected to signal <memadr_3>      |          |
    |     doB            | connected to signal <trigger_mem_asyncfifo_dout> |          |

So no clk on Port B.

The relevant verilog part for the asyncfifo looks like:

reg [21:0] storage_3[0:15];
reg [3:0] memadr_2;
reg [3:0] memadr_3;
always @(posedge sys_clk) begin
    if (trigger_mem_wrport_we)
        storage_3[trigger_mem_wrport_adr] <= trigger_mem_wrport_dat_w;
    memadr_2 <= trigger_mem_wrport_adr;
end

always @(posedge scope_clk) begin
    memadr_3 <= trigger_mem_rdport_adr;
end

assign trigger_mem_wrport_dat_r = storage_3[memadr_2];
assign trigger_mem_rdport_dat_r = storage_3[memadr_3];

so maybe the assign trigger_mem_rdport_dat_r = storage_3[memadr_3]; made the synthesis tool think that this equals a Dual-Port RAM with Asynchronous Read?

cklarhorst commented 3 years ago

I just found out that changing the read & write port mode to READ_FIRST yield the expected behavior (migen/genlib/fifo.py:229): rdport = storage.get_port(clock_domain="read", mode=READ_FIRST) and wrport = storage.get_port(write_capable=True, clock_domain="write", mode=READ_FIRST)

results in:

reg [21:0] storage_3[0:15];
reg [21:0] memdat_5;
reg [21:0] memdat_6;
always @(posedge sys_clk) begin
    if (trigger_mem_wrport_we)
        storage_3[trigger_mem_wrport_adr] <= trigger_mem_wrport_dat_w;
    memdat_5 <= storage_3[trigger_mem_wrport_adr];
end

always @(posedge scope_clk) begin
    memdat_6 <= storage_3[trigger_mem_rdport_adr];
end

assign trigger_mem_wrport_dat_r = memdat_5;
assign trigger_mem_rdport_dat_r = memdat_6;

But it would increase latency by one.

cklarhorst commented 3 years ago

Update: I have created a smaller example: minispartan6.py.txt

It contains a small module:

class TestModule(Module, AutoCSR):
    def __init__(self):
        self.counter = counter = Signal(10)
        self.sync.sys2x_ps += counter.eq(counter + 1)

        self.counterCSR = CSRStatus(10)

        mem = stream.AsyncFIFO([("data", 10)], 4)
        mem = ClockDomainsRenamer({"write": "sys2x_ps", "read": "sys"})(mem)
        self.submodules += mem
        self.comb += [
            mem.source.ready.eq(self.counterCSR.we),self.counterCSR.status.eq(mem.source.data),
            mem.sink.valid.eq(1),
            mem.sink.data.eq(counter)
        ]

Architecture: Fast Counter -> AsyncFIFO -> CSR And this already creates a timing problem with ISE.

Now I think that there might be a false path in the AsyncFIFO. I found out that the vivado backend would add an attribute to one of the involved FF (mr_ff) and the ISE backend doesn't do that. In fact the ISE backend doesn't create any false path constraints and also no async_reg attributes.

I'm not sure if I'm now really on the right track here btw.

enjoy-digital commented 3 years ago

@cklarhorst: a false path should indeed be applied on the gray counters of the AsyncFIFO. The false paths are not applied automatically in LiteScope and you have to use Platform.add_false_path_constraint in your design (or Platform.add_platform_command).

cklarhorst commented 3 years ago

Regarding the automatic false paths:

Vivado: I think they are automatically created because all MultiRegs get a special annotation here and afterwards here there is a special xdc line added that just marks all those paths as false paths.

Altera: This backend uses a more global method, it just marks all paths interacting between clocks as false paths here.

Others: I didn't found anything about those automatic false paths for the lattice, microsemi and ise backends.

So if I understood everything correctly I would say that I like that vivado approach the most because I would expect that it breaks early if someone misses a MultiReg. Sadly, I think I'm not able to backport that solution for ISE because it uses get_nets to find all nets but in ISE the UCF file doesn't allow filtering nets by attributes.

For testing, I just added false path constraints to the minispartan6 by hand:

platform.add_platform_command("TIMESPEC TS_A1=TO FFS(xilinxmultiregimpl0_regs0) TIG;")
platform.add_platform_command("TIMESPEC TS_A2=TO FFS(xilinxmultiregimpl1_regs0*) TIG;")
platform.add_platform_command("TIMESPEC TS_A3=TO FFS(xilinxmultiregimpl2_regs0*) TIG;")

and it was working.

@enjoy-digital Isn't it possible to somehow scan through the design (at the time when the ucf file is generated here) and find all MultiRegs? Then I would only need to figure out how they get named in the verilog file and I would be able to call add_platform_command and create a false path for each MultiReg.

Thanks for your help and sorry for the long post.

cklarhorst commented 3 years ago

Hi again and sorry for the long delay.

I wrote a short function for the ISE backend that scans through the generated verilog code in order to call add_platform_command (to create a false path constraint) for each found mr_ff annotation. Additionaly, I found that post which says

ISE and Diamond do not consider all clocks related by default. So, there is no need to automatically define false path constraints in these toolchains; although one might need to manually add false path constraints for clocks that go through PLLs, etc.

I know think that I stumbled upon that special case because I used LiteScope (and the AsyncFifo) class to analyze signals in different clocking regions (possible created through PLLs).

Function:

def gen_constraints_from_verilog(self, verilog, platform):
  import re
  mr_ff_pattern = re.compile('.*mr_ff.*reg (?:\[.*\] )?(\S+)')
  unique_id = 0
  for line in str(verilog).splitlines():
    try:
      reg_name = mr_ff_pattern.search(line).group(1)
      platform.add_platform_command("TIMESPEC TS_FALSE%d=TO FFS(%s*) TIG;" % (unique_id, reg_name))
      unique_id += 1
    except AttributeError:
      pass

@enjoy-digital Do you want to have a PR for this? (I think scanning through the generated source might not be the most elegant solution. Additionally, there doesn't seem to be many people affected by this?) Otherwise, I would like to close this issue.