YosysHQ / prjtrellis

Documenting the Lattice ECP5 bit-stream format.
Other
399 stars 87 forks source link

Incorrect LUT allocation on ECP5 #189

Closed podhrmic closed 2 years ago

podhrmic commented 2 years ago

hello!

I have a simple SoC design (a nerv CPU with some memory and IO), and I am targeting the ECP5 development board. The issue I am getting is device utilization over 100%, even though the design should comfortably fit on the FPGA.

The SoC uses 36kB of RAM, which should comfortably fit on LFE5UM-85 with 468kB of RAM (see below): Screenshot from 2022-04-20 11-48-43

However, after the synthesis yosys synth_ecp5... and nextpnr-ecp5 --um5g-85k --package CABGA381 ... I end up with the following:

Info: Logic utilisation before packing:
Info:     Total LUT4s:     100595/83640   120%
Info:         logic LUTs:  62865/83640    75%
Info:         carry LUTs:    626/83640     0%
Info:           RAM LUTs:  24736/41820    59%
Info:          RAMW LUTs:  12368/20910    59%

and

Info: Device utilisation:
Info:          TRELLIS_SLICE: 51637/41820   123%
Info:             TRELLIS_IO:    91/  365    24%
Info:                   DCCA:     1/   56     1%
Info:                 DP16KD:    57/  208    27%
Info:             MULT18X18D:     0/  156     0%

Naively I would expect using more DP16KD elements, given they are dual port RAM?

Is there some setting I am missing? Is this potentially a bug in prjtrellis? I haven't tried synthesis with the Lattice Diamond tool yet, so I cannot compare.

gatecat commented 2 years ago

I can't provide any more help without the design. It is likely that some of the RAM isn't being mapped correctly to EBR but that could either be a design issue or a prjtrellis issue.

podhrmic commented 2 years ago

@gatecat understood. I am using RegFileLoad.v - for instruction memory as:

RegFileLoad #(.file("dmem_contents.memhex32"),
        .addr_width(32'd30),
        .data_width(32'd32),
        .lo(30'd0),
        .hi(30'h00003000),
...

and for the data memory as:

RegFileLoad #(.file("imem_contents.memhex32"),
        .addr_width(32'd30),
        .data_width(32'd32),
        .lo(30'd0),
        .hi(30'h00007000),
        .binary(1'd0)) 
...

Is this information helpful?

gatecat commented 2 years ago

Those are very big, multi port memories with async reads so they won't in general map to EBR. You should do something with a synchronous (clocked) read instead.

podhrmic commented 2 years ago

@gatecat adding an output register as suggested in Figure 7: RAM with Registered Output in Verilog in Lattice Synthesis Engine for Diamond User Guide did the trick, even without adding explicit verilog attributes. Thank you!