Open rakeshm75 opened 1 year ago
@nakengelhardt let's use this issue for further discussion on memory_libmap support
I had a look at the examples in https://github.com/QuickLogic-Corp/yosys-f4pga-plugins/tree/main/ql-qlf-plugin/tests/qlf_k6n10f, especially bram_tdp and bram_tdp_split, and I am confused by your definition of TDP mode. It seems that the input in bram_tdp.v describes a memory with two independent read ports and two independent write ports, so four ports total. But the output of the current synth_quicklogic
implementation is mapped to a memory with two read-write ports, and it looks like that will write to the read address if rce and wce are enabled simultaneously. This seems incompatible with the input code to me, am I misunderstanding something?
The basic TDP36K BRAM IP has only one addr port (no separate read address and write address): .CLK_A1_i .ADDR_A1_i .WEN_A1_i .BE_A1_i .REN_A1_i .WDATA_A1_i .RDATA_A1_o
In the mapping the read address & write address are multiplexed to port address based on read or write: (in brams_map.v file) assign PORT_A_ADDR = A1EN ? A1ADDR_TOTAL : (B1EN ? B1ADDR_TOTAL : 15'd0); assign PORT_B_ADDR = C1EN ? C1ADDR_TOTAL : (D1EN ? D1ADDR_TOTAL : 15'd0);
Yes, you are correct it would write to the read address when rce and wce are enabled simultaneously of the same port (A or B).
In that case, we won't be able to reuse the existing ql_bram_split
pass, I'll need to write a replacement. Can you confirm the exact limitations on the allowed combinations of setting the 8 mode parameters:
RMODE_A1_i
WMODE_A1_i
RMODE_A2_i
WMODE_A2_i
RMODE_B1_i
WMODE_B1_i
RMODE_B2_i
WMODE_B2_i
Which ones need to always be equal and which are allowed to be set to different values?
With 36K, RMODE_A1_i = WMODE_A1_i = RMODE_A2_i = WMODE_A2_i RMODE_B1_i = WMODE_B1_i = RMODE_B2_i = WMODE_B2_i With18K , RMODE_A1_i = WMODE_A1_i RMODE_B1_i = WMODE_B1_i RMODE_A2_i = WMODE_A2_i RMODE_B2_i = WMODE_B2_i
Thanks! In that case, should I also allow asymmetric ports in split mode, or will that be a problem for the downstream flow?
How should I deal with the attributes "rd_data_width" and "wr_data_width"? The names don't really make sense since the widths are per-port and each port can do read and write simultaneously. Are the timing differences only influenced by the port modes? Should it be something like port_a_width/port_b_width for 36k mode and port_a1_width/port_a2_width/port_b1_width/port_b2_width for split mode?
Yes, we should allow asymmetric ports in split mode as well (18K bram).
For SPRAM, one of the port was tied as write port and other port as read port, that the reason the attributes were named "rd_data_width" & "wr_data_width". But as you said each port can do a read and write, it would be appropriate have the attributes named based on ports, as you mentioned above port_a_width/port_b_width for 36k mode and port_a1_width/ port_a2_width/port_b1_width/port_b2_width for split mode.
Only for the FIFO's there is a restriction that the write port (Push) will always be tied to Port A and the read port (pop) will always be tied to Port B, as the FIFO flags are tied to the Port A read data port. Anyways, inferencing is not supported for FIFO's only FIFO instantiation is supported.
Timing difference is influenced by port widths, split/nonsplit and fifo modes.
One question, earlier you mentioned that we won't be able to reuse the existing ql_bram_split pass and you would write a replacement, so if a designer writes a dual port ram like the one in bram_tdp and bram_tdp_split (with separate read and write address each port) would it be inferred ?
No, I think those descriptions would require a primitive with at least 3 ports (2 write + 1 read) to map the semantics. An additional read port could be created with duplication, but there's no way to create more write ports than the primitive has, and if both are used to write then there is no port left over to read...
Ok, understood.
I've gone through the existing tests and rewritten the modules that infer memory with the patterns that memory_libmap expects. However the manually instantiated modules still use the old code path (bram_map.v + bram_final_map.v) which now has some issues as a) they are not considered as candidates for merging two 18k blocks in split mode, and b) they still use the rd_data_width/wr_data_width attributes vs the port_a_width/port_b_width. How would you like to handle this? Are all of the modules like BRAM2x18_TDP considered user-facing primitives that should continue to be supported?
(FYI I've also had some simulation issues with the included sim tests where sdffsre cells would always output X, so I've had to rewrite the bram_tdp examples in a way that changes the behavior of the module to not infer FF cells, that's why it has those intermediate wires for ce now.)
With regards to the manually instantiated modules, I will make the changes to use the port_a_width/port_b_width instead of rd_data_width/wr_data_width attributes. Also, I would look into using the libmap_brams_map.v or integrate manually instantiated modules into this file.
The user facing primitives would be the following:
I have been running tests with latest PR (#2), few observations:
I see that the Address and the Write data inputs to the RAMs are registered. Why is it registered?
I have attached the design here with the synthesis output files.
If you run memory_libmap with debug it will tell you: - emulate transparency with write port 0
The pattern in this memory is for read-first behavior but the TDP36K block has undefined collision behavior, so it will insert some extra logic to handle this. The pattern for undefined behavior that can map directly would be:
always @(posedge clk)
begin
if (wce)
memory[wa] <= wd;
if (rce)
rq <= memory[ra];
if (wce && wa == ra)
rq <= 'x;
end
Alternatively you can add the no_rw_check attribute to suppress collision handling:
(* no_rw_check = 1 *) reg [DWIDTH-1:0] memory[0:(2**AWIDTH)-1];
Were you able to test the PR sufficiently? Is there something else you need me to do on it, or should I get started on reviewing the other parts of synth_quicklogic
with a view to getting it merged upstream?
Yes, we have tested the PR, you can proceed on reviewing the other parts of synth_quicklogic.
@nakengelhardt When I run synthesis on the PICOSOC design, I get the following error:
7.36. Executing MEMORY_LIBMAP pass (mapping memories to cells). mapping memory top.soc.cpu.cpuregs via $__QLF_TDP36K Memory top.soc.memory.mem mapping candidates (post-geometry): Memory top.soc.memory.mem mapping candidates (after post-geometry prune): ERROR: no valid mapping found for memory top.soc.memory.mem
I have attached the design and the synthesis report.
Can you please help us understand this issue?
Earlier with MEMORY_BRAM pass, the ram inferring was happening without any issue on this design.
That memory is explicitly designated with (* ram_style = "distributed" *)
in the code at picosoc_noflash.v:225
. Since k6n10f doesn't have any distributed memory (LUTRAM), this ram style is impossible to implement. If you remove the attribute or change it to a supported ram style, it will happily infer it.
I've updated memory_libmap
to print a message about forced/disabled RAM styles when such an attribute is encountered, this should make it at least a little easier for the user to understand what is happening: https://github.com/YosysHQ/yosys/pull/3826
Thank you!
@nakengelhardt What information is required to support RAM initialization? What format should the initialization data be provided by the user?
On the user side it would be assignments in initial
blocks, either directly with a for loop or from a file with $readmemh
functions.
For implementing the support, the information required is how the ram primitive expects the data to be provided. In $mem_v2
the initialization data is saved as one constant the size of the entire array, and this has to be translated to the format that the primitive wants during the techmap step.
The RAM primitive expects the data in the following format:
Split=1 (18K RAM)
x1: RAM_Data[17:0] = {2'b00,data15,....data2,data1,data0}; x2: RAM_Data[17:0] = {2'b00,data7[1:0],....data1[1:0],data0[1:0]}; x4: RAM_Data[17:0] = {2'b00,data3[3:0],....data1[3:0],data0[3:0]}; x8: RAM_Data[17:0] = {2'b00,data1[7:0],data0[7:0]}; x9: RAM_Data[17:0] = {data1[8],data0[8],data1[7:0],data0[7:0]}; x16: RAM_Data[17:0] = {2'b00,data0[15:0]}; x18: RAM_Data[17:0] = {data0[17],data0[16],data0[15:0]};
NonSplit=1 (36K RAM)
x1: RAM_Data1[17:0] = {2'b00,data15,....data2,data1,data0}; RAM_Data2[17:0] = {2'b00,data31,....data18,data17,data16}; x2: RAM_Data1[17:0] = {2'b00,data7[1:0],....data1[1:0],data0[1:0]}; RAM_Data2[17:0] = {2'b00,data15[1:0],....data9[1:0],data8[1:0]}; x4: RAM_Data1[17:0] = {2'b00,data3[3:0],data2[3:0],data1[3:0],data0[3:0]}; RAM_Data2[17:0] = {2'b00,data7[3:0],data6[3:0],data5[3:0],data4[3:0]}; x8: RAM_Data1[17:0] = {2'b00,data1[7:0],data0[7:0]}; RAM_Data2[17:0] = {2'b00,data3[7:0],data2[7:0]}; x9: RAM_Data1[17:0] = {data1[8],data0[8],data1[7:0],data0[7:0]}; RAM_Data2[17:0] = {data3[8],data2[8],data3[7:0],data2[7:0]}; x16: RAM_Data1[17:0] = {2'b00,data0[15:0]}; RAM_Data2[17:0] = {2'b00,data1[15:0]}; x18: RAM_Data1[17:0] = {data0[17],data0[16],data0[15:0]}; RAM_Data2[17:0] = {data1[17],data1[16],data1[15:0]}; x32: RAM_Data1[17:0] = {2'b00,data0[15:0]}; RAM_Data2[17:0] = {2'b00,data0[31:16]}; x36: RAM_Data1[17:0] = {data0[17],data0[16],data0[15:0]}; RAM_Data2[17:0] = {data0[35],data0[34],data0[33:18]};
@nakengelhardt I have shared the above information regarding the ram primitive data format. Is the above information sufficient or Do you require more information to support ram initialization?
@rakeshm75 Thanks for the information! I just got back to the office yesterday, this is my next priority to work on.
In the module definition of TDP36K
in brams_sim.v
there isn't any parameter RAM_Data
. I believe you mentioned that the INIT_XX
parameters that are currently there are not the actual parameters expected by the P&R tool. Could you provide an updated simulation model for the TDP36K primitive with the parameters that should be assigned when mapping the memory?
Also, in addition to the order of the bits for one line in the RAM, could you specify the order of the memory lines in the parameter value?
@nakengelhardt I have attached the RAM initialization document here. Please review and we could discuss the details.
Hi @rakeshm75, I am following up on @nakengelhardt's work on this.
I opened a PR to add the RAM_INIT
property per your description to the simulation model. Feel free to check the model for correctness.
I will be working on adding synthesis support for this next.
In the document, you ask
Apart from the parameter in the synthesis output Verilog netlist and the blif output file, can synthesis file output the initialization data as hex file in the above-mentioned data format for each RAM block?
In principle it can. If implemented, it should probably be enabled by passing a separate option to synth_quicklogic
. Could you please tell more about the intended use-case of this? Is it perhaps for user inspection, or to be consumed by a loader later?
Hi @povik
Yes, it would be consumed by the loader later. The purpose of the initialization data as hex file is for the loading/programming the initialization data through the pre-load bus.
Regards, Rakesh
Thanks Rakesh. I opened a new PR #9 which adds synthesis support for initial values on top of the extension of the sim models from PR #8, which I now closed. I will get back to you on the feature of emitting separate RAM data files from synthesis.
@nakengelhardt I have attached a VHDL DPRAM (4096x12) design which is run through Yosys (using Verific) synthesis. The MEMORY_LIBMAP pass does not infer a BRAM. The same design on the MEMORY_BRAM pass infers the BRAM properly.
I have attached the design and synthesis report for both the runs (MEMORY_LIBMAP & MEMORY_BRAM).
@rakeshm75 I looked into what happens there. The Verific frontend parses out the design to have read-first behavior in case of an address collision between a read on one of the A/B ports and a write on the other. That collision behavior is specified as undefined for TDP36K in the libmap definition, and since it can't be emulated when there are shared R/W ports, the block memory doesn't get inferred.
You can define TDP36K to have read-first behavior across the dual ports by adding wtrans all old;
below the port definition in the library, then it does get inferred, but I don't know if the property is true of the underlying hardware. @nakengelhardt's earlier comments suggest that it's not.
Looking into the frontend code, Verific always takes memories to be read-first, no matter the input code.
I'm not well versed in VHDL, but I think this is a collision handling problem. By analogy with the verilog pattern, I've found these options for having it infer the block ram:
disable the check via the no_rw_check attribute:
architecture rtl of xip_dpmem is
type mem_type is array(0 to 2**AWIDTH_G-1) of std_logic_vector(DWIDTH_G-1 downto 0);
shared variable mem : mem_type;
attribute no_rw_check : string;
attribute no_rw_check of mem : variable is "true";
begin
explicitly handle the collision:
-- Port A
process(clk_i)
begin
if rising_edge(clk_i) then
if wren_a_i = '1' then
mem(to_integer(unsigned(addr_a_i))) := din_a_i;
end if;
dout_a_o <= mem(to_integer(unsigned(addr_a_i)));
if ((wren_b_i = '1') and (addr_a_i = addr_b_i)) then
dout_a_o <= (others => 'X');
end if;
end if;
end process;
-- Port B
process(clk_i)
begin
if rising_edge(clk_i) then
if wren_b_i = '1' then
mem(to_integer(unsigned(addr_b_i))) := din_b_i;
end if;
dout_b_o <= mem(to_integer(unsigned(addr_b_i)));
if ((wren_a_i = '1') and (addr_a_i = addr_b_i)) then
dout_b_o <= (others => 'X');
end if;
end if;
end process;
There's a separate issue with distinguishing between read first and write first patterns with the verific frontend that I need to look into...
@nakengelhardt One question here, with Verilog for read-first behavior, Yosys inserts extra logic to handle the undefined collision behavior and infers the BRAM. But here in VHDL, Yosys is not inferring the BRAM at all as default (without attribute), What is the reason?
Yes, with the above attribute (no_rw_check ) or with the code to explicitly handle the collision, Yosys infers the BRAM correctly with VHDL as well.
@rakeshm75 I checked and it doesn't infer the equivalent verilog code either. IIRC the previous example had two different clocks, whereas in this case both ports are using the same clock, so it also needs to make sure that address collisions between addr_a_i
and addr_b_i
are handled correctly, and this is not possible to emulate if more than one write port exists. For the case where the two ports are in different clock domains, this part of the collision checks is omitted as there's no presumption of consistent order between the operations on the different ports anyway, so there is only the behavior of the read data signal on the writing port to consider.
Regarding the issue of verific only inferring read first ports even for other patterns, we have merged a fix today: https://github.com/YosysHQ/yosys/pull/3927 With this fix, it now also infers correctly the two patterns listed as "not yet supported" in our docs when using verific.
To try to give more detail on when emulation is possible: Both the emulation of read-first and write-first collision behavior requires to delay the writes on the underlying hardware primitive by one cycle. In general that's not possible to implement on a shared R/W port since that can't be fitted into the limitation to operate on a single address per cycle.
With two synchronous R/W ports that share a clock domain and require emulation of read-first colllision behavior with respect to each other, the inference code just can't find a way to map it onto TDP36K. The inference explores many options, so if there's e.g. one R/W port and one R port with the read-first requirement, then it finds it can implement that by duplicating the memory, performing writes simultanously on both instances in parallel, and using one instance for one read port and the other for the other port.
@nakengelhardt , @povik Got it... thanks for the explanation!
I will get back to you on the feature of emitting separate RAM data files from synthesis.
@rakeshm75 I sketched something in PR #10, let me know how well that would work for you.
@povik @nakengelhardt
I have attached ROM design (VHDL) here. How do we force Yosys to infer BRAM? I have used the "ram_style" attribute but it does not infer the BRAM.
[Uploading rom_des.zip…]()
@rakeshm75 I think you might have submitted the comment before the file upload completed? I am only seeing a link that goes back to this issue...
@nakengelhardt Oh sorry, I am uploading the design here.
Thanks, I have investigated the example. The constant is not its own element in the netlist so we can't see attributes from it, but we don't currently look at attributes from the output register, which we should. I will look into modifying memory_libmap
so that this coding style also works.
@nakengelhardt @povik We want to disable True Dual Port RAM inferring for our new device architecture. How do we disable Yosys from inferring tdp ram inferring? Memory libmap should infer only the SDP ram not the TDP ram.
The description in the libmap_brams.txt file represents only the TDP mode. Since SDP is just a subset of TDP where some signals are unused, it's not considered a separate mode. If you want to infer SDP only, that requires writing new libmap_brams.txt
, and changing ql_bram_merge
and libmap_brams_map.v
to work on the new port layout.
@nakengelhardt, we are still going to support use of 36k BRAM as two 18K BRAM in split mode. Nothing changes on that front. In fact, the external pin interface of TDP36K remains unchanged. We'll still have 4 ports on 36K BRAM. The only change is that we don't support True Dual Port mode.
To confirm, before I start working on modifying ql_bram_merge
and libmap_brams_map.v
, in SDP mode the restrictions are that one port can be used for reading only and the other for writing only, but otherwise all the configuration options are still the same (each port can pick a different width, and the two ports can be using different clocks)?
So the corresponding libmap description would be:
ram block $__QLF_SDP36K {
init any;
byte 9;
option "SPLIT" 0 {
abits 15;
widths 1 2 4 9 18 36 per_port;
}
option "SPLIT" 1 {
abits 14;
widths 1 2 4 9 18 per_port;
}
cost 65;
port sr "RD" {
clock posedge;
clken;
}
port sw "WR" {
clock posedge;
wrbe_separate;
}
}
@nakengelhardt Yes, you are correct. Port A is used for write and PORT B is used for read. Yes, each port can pick a different width, and the two ports can be using different clocks.
Here when we want to define ports separately for write and read, the names should be "W" or "WR" and "R" or "RD" correct? we cannot use "A" and "B" ??
I made the changes as follows and also modified the port names in libmap_brams_map.v accordingly (ex... PORT_A_WR_DATA -> PORT_W_WR_DATA, PORT_B_RD_DATA -> PORT_R_RD_DATA ...)
ram block $__QLF_SDP36K {
init any;
byte 9;
option "SPLIT" 0 {
abits 15;
widths 1 2 4 9 18 36 per_port;
}
option "SPLIT" 1 {
abits 14;
widths 1 2 4 9 18 per_port;
}
cost 65;
port sr "R" {
clock posedge;
clken;
}
port sw "W" {
clock posedge;
clken;
wrbe_separate;
}
}
@nakengelhardt What would be the best way to support both the architectures, one device that supports TDP (TDP & SDP) and other that supports only SDP?
@rakeshm75 You can choose the names freely, if you prefer "A" and "B" you can name them that (I was just thinking it would be easier to read the code in the techmap file if the name also implies the capability).
If you want to support two devices with different memory blocks, I would recommend having two libmap files (e.g. bram_tdp.txt and bram_sdp.txt), and then passing one or the other when calling memory_libmap
in synth_quicklogic
. The techmap file and the ql_bram_merge
pass probably don't need to be separated, as they should always transform the input cells into output cells of the same kind.
@rakeshm75 I see that you still have clken;
on port W. Is there actually a clock enable port separate from the write enable port on the hardware primitive?