amaranth-lang / amaranth

A modern hardware definition language and toolchain based on Python
https://amaranth-lang.org/docs/amaranth/
BSD 2-Clause "Simplified" License
1.54k stars 170 forks source link

Creating Memory Arrays without creating another submodule #1523

Closed epkRichi closed 1 day ago

epkRichi commented 1 day ago

In Verilog, you can create an array like this: reg[31:0] my_array[1023:0]. You can then directly read from or write to the array. The array can also be synthesized as BRAM if you don't access it incorrectly.

In Amaranth, the only way I've found to create an array that can be synthesized as BRAM is by using lib.memory.Memory. But my problem with that is that this Memory has to be instantiated as a submodule and you have to communicate with it through read and write ports. If you want the memory to be synthesized as BRAM, the read port has to be synchronous. So if you want to access some data from the memory, you've got to initiate a read transfer by setting the signals on the read port accordingly, but the read data will only be available two cycles later.

So reading from lib.memory.Memory takes two additional cycles compared to what I would do in Verilog, plus you have to enable and disable the read port.

I've also tried using data.ArrayLayout and hdl.Array, but they serve other purposes and they get translated to one very big register or one reg per element in the verilog code which cant be synthesized to BRAM as far as I know.

Is this a legitimate problem or am I just trying to do something you're not supposed to do? (I don't have a ton of experience designing hardware, so that might very well be the case)

whitequark commented 1 day ago

The array can also be synthesized as BRAM if you don't access it incorrectly.

Correct. This is an important part of why Amaranth memories are designed the way they are: the access pattern shouldn't cause "spooky action at a distance", influencing the synthesized logic for a memory array.

Or if you think of it from a different perspective: Verilog arrays are meant to be just arrays, same as in any other programming language, while Amaranth arrays are meant to be specifically memory arrays, whose interface and representation are optimized for memory-like access patterns.

If you want the memory to be synthesized as BRAM, the read port has to be synchronous.

This is also true for Verilog. The only difference is that the access pattern is implicit in Verilog (a part of the process that retrieves values from a memory), rather than explicit (a parameter to .read_port()).

So if you want to access some data from the memory, you've got to initiate a read transfer by setting the signals on the read port accordingly, but the read data will only be available two cycles later.

Yes, but it should be one cycle later. I'm not sure how you end up with two; can you provide an example, maybe in the Amaranth Playground?

This will also show you a Verilog equivalent for the code you're writing, which might make it a bit easier to correlate the two.

So reading from lib.memory.Memory takes two additional cycles compared to what I would do in Verilog,

You will never see a Verilog array with asynchronous read access be (correctly) synthesized as BRAM, because BRAMs require reads to be clocked on every FPGA family I'm aware of. This is a property of the underlying primitive, not a limitation of the language. (It seems that you're aware of it, but I will also note that Amaranth supports asynchronous read ports, which synthesize to LUTRAM or FFRAM).

plus you have to enable and disable the read port.

The initial value for the en signal of the read port is 1, so you can leave it out. (There are some reasons to access BRAMs in a read-xor-write pattern, but it seemed fine to ignore them for the default.)

epkRichi commented 1 day ago

Thanks for the quick reply. I've created a small example in the playground here

The example contains a small state machine where in state 0, I decide that I want to do some computation or maybe even branch based on the value of the memory at some address (which is always 5 in this example). So in state 0, I send a request via the read port. Since the read port is synchronous, the memory will only "start working" on that request in the next cycle, meaning that I can only access the read data two cycles later - in state 2. But what I'd really like to do is to already use the read data in state 0. In Verilog, I could do that:

reg[31:0] mymemory[7:0];
reg[1:0] state;
reg[31:0] computation;

always @ (posedge clk) begin
    if (state == 0) begin
        computation <= mymemory[5] * 3 + 2;
        if (mymemory[5] == ...) begin ...
    end
end

(this is only an incomplete snippet)

The initial value for the en signal of the read port is 1, so you can leave it out. (There are some reasons to access BRAMs in a read-xor-write pattern, but it seemed fine to ignore them for the default.)

So it is fine to always leave the read port enabled and only change the address when I want to access a new value, even for the synthesized design?

Btw. the playground example I shared throws an error if i record the waveform (line 40). Not sure if that is a problem with my code or with the playground.

whitequark commented 1 day ago

So it is fine to always leave the read port enabled and only change the address when I want to access a new value, even for the synthesized design?

Yes.

Since the read port is synchronous, the memory will only "start working" on that request in the next cycle, meaning that I can only access the read data two cycles later - in state 2.

This is an artifact of how your FSM is written. Generally, in order to "command" something to process data from an FSM, you would use m.d.comb += command.eq(...), and in order to capture the outputs and store them in the module whose behavior is defined by the FSM, you would use m.d.sync += data.eq(...). Playground example.

whitequark commented 1 day ago

But what I'd really like to do is to already use the read data in state 0. In Verilog, I could do that: [snip]

This would usually create an asynchronous port, and wouldn't synthesize to a BRAM.

epkRichi commented 1 day ago

Thanks for the updated example. I never even thought of using an FSM with combinational logic like that. Seems like I can keep on using Amaranth then :)

This would usually create an asynchronous port, and wouldn't synthesize to a BRAM.

So in Verilog I would also need to wait one cycle before using the read data if I want the array to be synthesized as BRAM, meaning that using a memory as a submodule (like Amaranth's lib.memory.Memory) actually doesn't slow anything down - right?

whitequark commented 1 day ago

meaning that using a memory as a submodule (like Amaranth's lib.memory.Memory) actually doesn't slow anything down - right?

Yes.

epkRichi commented 1 day ago

Alright, thanks a lot for your help, you're awesome!

whitequark commented 1 day ago

Happy to help!