YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.48k stars 890 forks source link

ROM memories initialized with $readmemh not using $memrd cells #2020

Closed tilk closed 4 years ago

tilk commented 4 years ago

I have a question about memory handling. The following SystemVerilog is interpreted by the frontend as a memory - the $memrd, $memwr and $meminit cells are generated:

module ram(input clk, wr, input [7:0] idata, input [3:0] addr, output [7:0] data);
    logic [7:0] mem[0:15];
    assign data = mem[addr];
    initial $readmemh("image.hex", mem);
    always_ff @(posedge clk)
        if (wr) mem[addr] <= idata;
endmodule

But if I remove the write port, this gets converted to separate values:

module rom(input [3:0] addr, output [7:0] data);
    logic [7:0] mem[0:15];
    assign data = mem[addr];
    initial $readmemh("image.hex", mem);
endmodule

I can prevent this from happening by using -nomem2reg, but it is said that this is dangerous. Why is it happening? Is there something wrong with the second code, which prevents the use of memories?

For some reason, if I initialize the memory with a for loop, memory cells are generated:

module test(input [3:0] addr, output [7:0] data);
    logic [7:0] mem[0:15];
    assign data = mem[addr];
    integer i;
    initial for(i = 0; i < 16; i = i + 1) mem[i] = i;
endmodule

I would very like the second code to be interpreted as memory in DigitalJS. When it's not, I get poor performance in DigitalJS and can't use the memory inspection GUI.

dh73 commented 4 years ago

Hello,

Small ROM memories are generally implemented in FPGA with combinational elements such as look-up tables, due quality of results (QoR) goals. Synchronous ROMs, or clocked ROMs, are implemented in the same way, but using flip-flops connected to either the address or the data out ports. Those are guidelines resulting from FPGA architectures and design goals. But users can guide the tools with certain attributes, when other option is really needed.

Why, your ROM is implemented as a chain of case statements by the Verilog frontend, and not by $memrd cells?, it may be because:

  1. Cell definitions: The $memrd cells are intended to be transformed to dff cells, or a read port of a technology specific BRAM (if the flop array meets the synthesis requirements). These cells are clocked (i.e, sequential constructs). Your ROM is a pure combinational model.
  2. Synthesis optimisations: The implementation of your ROM depends in the dimension of the ROM array, and the contents of the initialisation file. This can also answer the other inquiry.

I can prevent this from happening by using -nomem2reg, but it is said that this is dangerous.

Consider an initialisation image of zeros. Since there is no write port in the model, and the contents of the ROM are all zeros for all read addresses, what makes more sense is to either optimise that hardware from the design (constant propagation will end removing the circuit if it is a top level module), or connect the consumers of that circuit to a GND cells. In both scenarios, the result is equivalent to the design intention, and is optimised (forcing a number of LUTs/BRAM to implement that design will consume unwanted, unnecessary power and area).

The same optimisations applies if the initialisation data does not use certain bits of the whole address dimension. Those unused ports are optimised, and that migh be reflected in a reduced number of cells implementing the design. The nomem2reg adds an attibute to the array that prevents these very util optimisations.

For some reason, if I initialize the memory with a for loop, memory cells are generated:

Unfurtunately, this circuit does not seems to infer what you think. Have you tried synthesize it instead of just reading it?

I would very like the second code to be interpreted as memory in DigitalJS. When it's not, I get poor performance in DigitalJS and can't use the memory inspection GUI.

I personally don't know DigitalJS, but this sounds as a problem of how that tool works, not related with Yosys.

I am sorry if the answer is very largem and If I explained things you already know. This is just for the sake of clarity.

tilk commented 4 years ago

Thank you for your answer. I have two questions now:

  1. Can I instruct my students to use (* nomem2reg *) attribute to get memory cells for the rom module, saying it's just for the needs of simulation?
  2. I do not understand your comment regarding the code using the initial block with a for loop for memory initialization. Yes, I did try to synthesize it, and yes, Yosys synthesizes it to memory cells. I do not understand why this example is treated differently by Yosys than the one with $readmemh.
dh73 commented 4 years ago

Gotcha, I understand now,

DigitalJS is a circuit simulator, therefore you get a memory using initial for(i = 0; i < 16; i = i + 1) mem[i] = i;.

This construct is not synthesizable . I was speaking of synthesis tasks all the time, since that is the primary use of Yosys (and that is Yosys goal, process/generate synthesizable models).

What you see in this issue is one of these famous synthesis/simulation deltas. You expect to have some hardware that behaves correctly in simulation, to what you implement in synthesis, but the result never matches. Here, the design is transformed due a synthesis optimisation opportunity. A simulator will not show that behavior. I think this is a good lesson for future digital design engineers. The DigitalJS tool does not show you this (empty module):

=== test ===

   Number of wires:                 19
   Number of wire bits:            172
   Number of public wires:          19
   Number of public wire bits:     172
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                  0 < -
                                     ^
                                     |

In other words, the design with $readmemh gets optimised at very early stage of Verilog to internal representation conversion, whereas the design using foor loop does not. That's why you see different results. I would stick with the design with for loop. If you are more comfortable with $readmem, then it is fine to use the parameter nomem2reg but having in mind why this things are happening.

For instance, this is not synthesizable, yet DigitalJS shows a model of it:

module foo (output logic [2:0] t);
   integer     idx; 
   initial begin
      t = 0;
      for (idx = 0; idx < 3'h7; idx++)
        t = t + idx;
   end
endmodule

You probably may find more cases where Yosys does synthesis transformations that will not be very simulation friendly. Limitations will be present when these optimisation cannot be disabled.

whitequark commented 4 years ago

@dh73 I understand where you're coming from with this answer, but I respectfully disagree with your position here. Let me explain why.

If I understand correctly, DigitalJS is a simulator of, specifically, synthesizable logic. It even says this on the main button:

Screenshot_20200505_034318

Therefore, although it is consuming Verilog code, and it is simulating that code, conceptually it is not like a Verilog event-driven simulator (using the simulation semantics), but a Verilog synthesizer that happens to feed the result into a netlist simulator (using the synthesis semantics) instead of a place-and-route tool.

DigitalJS is a circuit simulator, therefore you get a memory using initial for(i = 0; i < 16; i = i + 1) mem[i] = i;. This construct is not synthesizable . I was speaking of synthesis tasks all the time, since that is the primary use of Yosys (and that is Yosys goal, process/generate synthesizable models).

This is not correct. Although IEEE 1364.1 does not mention initial for loops as an example, it does permit for loops with statically computable bounds in §7.7.6:

Screenshot_20200505_034832

Both vendor tools and Yosys explicitly accept this construct for memory initialization. In fact I have recently improved the handling of this construct in #1607.

Moreover, you can see that @tilk encounters the same undesirable behavior when using $readmemh, which is definitely permitted for synthesis. If you use read_verilog -debug you can see that the problematic behavior is exactly the same here.

Here, the design is transformed due a synthesis optimisation opportunity.

Based on a preliminary investigation I suspect this is an artifact of how ast/simplify detects memory ports and not an optimization. I can actually imagine a few cases where this makes synthesis results worse by preventing conversion to LUTROM.

Because of all of the above, I believe this is a legitimate issue with Yosys that we should look into.

whitequark commented 4 years ago

@tilk @dh73 This is actually an issue with how Yosys handles the SystemVerilog logic type. If you replace logic with reg in the last example:

module test(input [3:0] addr, output [7:0] data);
    reg [7:0] mem[0:15];
    assign data = mem[addr];
    integer i;
    initial for(i = 0; i < 16; i = i + 1) mem[i] = i;
endmodule

then the expected $memrd cells are produced.

whitequark commented 4 years ago

Fixed in #2029.

dh73 commented 4 years ago

Thank you for the detailed explanation!