YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.49k stars 893 forks source link

Gowin BSRAM inference -- We need B suffix (such as DPB instead of current DP) #4098

Closed chili-chips closed 10 months ago

chili-chips commented 10 months ago

Version

35

On which OS did this happen?

Linux

Reproduction Steps

https://github.com/YosysHQ/apicula/issues/208#issuecomment-1872635263

Expected Behavior

It's been a while since Gowin has added B suffix to their BSRAM primitives, to differentiate them from S(hadow) SRAM. nextpnr-gowin (project Apicula) has updated the naming. However, Yosys seems to still be using the original nomenclature, creating a disconnect.

We need BSRAM to be inferred as DPB, not DP.

Actual Behavior

BSRAM is inferred as DP instead of DPB.

yrabbit commented 10 months ago

A patch that purely mechanically corrects memory synthesis. Now there will be less swearing when creating images, but this should not make them work. b.path.txt

A good image is with direct use of the DPB primitive.

https://github.com/YosysHQ/yosys/assets/6075465/6746b0c4-dba3-40c1-9d1a-c1f99a81ebbe

Not a very good image - when we infer.

https://github.com/YosysHQ/yosys/assets/6075465/200afeaa-22e3-4b53-a685-7002d40971cb

And now a little of my stupid reasoning as a newbie. When we use a primitive directly, we set its parameters (data width, address width, write mode, read mode, etc.), and also explicitly connect signals to the inputs of the primitive - that is, we decide what to connect to ClockEnable and what to WriteEnable.

    DPB mem(
        .DOB({dummy, read_data}),
        .DIA({{8{gnd}}, write_data}),
        .DIB({16{gnd}}),
        .ADA({write_ad, gnd, gnd, gnd}),
        .ADB({ read_ad, gnd, gnd, gnd}),
        .CLKA(write_clk),
        .CLKB(clk),
        .OCEA(vcc),
        .OCEB(vcc),
        .CEA(write_ce),
        .CEB(vcc),
        .WREA(vcc),
        .WREB(read_wre),
        .BLKSELA(3'b000),
        .BLKSELB(3'b000),
        .RESETA(write_reset),
        .RESETB(reset)
    );
    defparam mem.READ_MODE0 = 1'b1;
    defparam mem.READ_MODE1 = 1'b1;
    defparam mem.WRITE_MODE0 = 2'b01;
    defparam mem.WRITE_MODE1 = 2'b01;
    defparam mem.BIT_WIDTH_0 = 8;
    defparam mem.BIT_WIDTH_1 = 8;
    defparam mem.BLK_SEL_0 = 3'b000;
    defparam mem.BLK_SEL_1 = 3'b000;
    defparam mem.RESET_MODE = "SYNC";

In the case of inferring, yosys selects the primitive, parameters and used ports based on an analysis of memory usage. The Gowin documentation has a list of boilerplate pieces of code that lead to the generation of certain primitives. This is SUG550-1.6E_GowinSynthesis User Guide.pdf

Here's a typical snippet from there: shot-0

Instructions on how to select a primitive during inferring are in the files: techlibs/gowin/brams.txt and techlibs/gowin/brams_map.v

You can see what yosys chose for each case in the logs (with the -g switch) and in the output .json file after synthesis.

This is how I tried to get yosys to generate DPB for the goblin faces above.

    (* ram_style = "block" *) reg [7:0] mem[10:0];

    always @(posedge clk or posedge reset) begin
        if (reset) begin
            read_data <= 0;
        end else begin
            read_data <= mem[read_ad];
        end
    end

    always @(posedge write_clk) begin
        if (write_ce) begin
            mem[write_ad] <= write_data;
        end
    end

That's what yosys did:

        "vmem.mem.0.0": {
          "hide_name": 0,
          "type": "DPX9B",
          "parameters": {
            "BIT_WIDTH_0": "00000000000000000000000000001001",
            "BIT_WIDTH_1": "00000000000000000000000000001001",
            "BLK_SEL_0": "000",
            "BLK_SEL_1": "000",
            "READ_MODE0": "0",
            "READ_MODE1": "0",
            "RESET_MODE": "ASYNC",
            "WRITE_MODE0": "00000000000000000000000000000010",
            "WRITE_MODE1": "00000000000000000000000000000010"
          },
          "attributes": {
            "module_not_derived": "00000000000000000000000000000001",
            "src": "/usr/local/bin/../share/yosys/gowin/brams_map.v:282.4-303.3"
          },
          "port_directions": {
            "ADA": "input",
            "ADB": "input",
            "BLKSELA": "input",
            "BLKSELB": "input",
            "CEA": "input",
            "CEB": "input",
            "CLKA": "input",
            "CLKB": "input",
            "DIA": "input",
            "DIB": "input",
            "DOA": "output",
            "DOB": "output",
            "OCEA": "input",
            "OCEB": "input",
            "RESETA": "input",
            "RESETB": "input",
            "WREA": "input",
            "WREB": "input"
          },
          "connections": {
            "ADA": [ 45, 45, 45, 804, 805, 806, 807, 45, 45, 45, 45, 45, 45, 45 ],
            "ADB": [ 45, 45, 45, 885, 884, 883, 882, 45, 45, 45, 45, 45, 45, 45 ],
            "BLKSELA": [ 45, 45, 45 ],
            "BLKSELB": [ 45, 45, 45 ],
            "CEA": [ 66 ],
            "CEB": [ 66 ],
            "CLKA": [ 880 ],
            "CLKB": [ 30 ],
            "DIA": [ 886, 887, 888, 889, 890, 891, 892, 893, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 ],
            "DIB": [ 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 ],
            "DOA": [ 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911 ],
            "DOB": [ 761, 728, 694, 660, 425, 794, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923 ],
            "OCEA": [ 45 ],
            "OCEB": [ 45 ],
            "RESETA": [ 45 ],
            "RESETB": [ 196 ],
            "WREA": [ 66 ],
            "WREB": [ 45 ]
          }
        },

Considering that wires 45 are GND, and 66 are VCC, it is already clear why there is garbage on the screen (partially): the WREA and CEA signals are always VCC, which means writing is always going on, although I clearly indicated that writing needs to be done only at certain moments.

And that’s not all that is not entirely correct with the parameters and ports.

To summarize: if for a guru it’s immediately obvious what little things need to be corrected for proper synthesis, then for me it’s a dark forest. Dual Port is too complex for inferring to understand and I plan to start experimenting with ROM and gradually move towards more complex options. And it doesn’t help matters that BSRAM in Tangnano9k may behave differently than in the documentation.

Seyviour commented 10 months ago

The bit order (reading from the "connection" entry) seems to be reversed in the inferred version. Could this be part of the problem?

yrabbit commented 10 months ago

Thank you, at this stage any advice is useful to me. But I think that in JSON, after synthesis, the low-order bits of the buses are located on the left. Well, at least when using the primitive explicitly, I get:

            "BLKSELA": [ 39, 39, 39 ],
            "BLKSELB": [ 39, 39, 39 ],
            "CEA": [ 66 ],
            "CEB": [ 66 ],
            "CLKA": [ 843 ],
            "CLKB": [ 30 ],
            "DIA": [ 904, 905, 906, 907, 908, 909, 910, 911, 39, 39, 39, 39, 39, 39, 39, 39 ],
            "DIB": [ 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39 ],
            "DOB": [ 724, 691, 657, 623, 392, 757, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921 ],
            "OCEA": [ 66 ],
            "OCEB": [ 66 ],

However, your observation made me reconsider my examples and I found a couple of errors. Thanks! :)

Seyviour commented 10 months ago

I'm happy to hear that, and glad to contribute however I can. I'll keep looking to see if I observe any other differences. Is there any chance you could share your examples?

yrabbit commented 10 months ago

This is just a simplified example from the apicula distribution: https://github.com/YosysHQ/apicula/tree/master/examples/himbaechel

bsram.ZIP

Here are two directories that are copies except for the DPB-video-ram.v file. If you applied the yosys patch (see previous posts) and have himbaechel nextpnr installed (see https://github.com/YosysHQ/apicula/issues/220#issuecomment-1876682997) then everything should compile.

The build goes like this:

make -f Makefile.himbaechel bsram-DPB-tangnano9k.fs

And yes, I know that I use an 8-bit primitive, and yosys makes 9-bit, but at the moment this is unimportant: as long as inferring ignores CE and WRE for writing, there will be garbage on the screen.

yrabbit commented 10 months ago

Corrected version (I had to regenerate the JSON) now infer the WRE port as connected to the network. This is not CE, but it will also do. inerr-1

The picture, however, is still garbage, so we think further about what’s going on here: did I screw it up in the top-level code or is it still not good to use a 9-bit primitive for 8-bit memory. Or maybe force yosys to use registers on the memory output first.

https://github.com/YosysHQ/yosys/assets/6075465/440e219a-b166-4fb7-8f2e-719b90b8c0b1

yrabbit commented 10 months ago

Something interesting. I wonder what I wrote incorrectly in the top-level code that yosys decided that 4 bits for the address is enough? The upper JSON is inferring, the lower one is manual use of the DPB primitive.

@Seyviour You have all the source codes, can you tell me where I managed to turn the address [10:0] into [3:0]?

addr

Seyviour commented 10 months ago

This is just a simplified example from the apicula distribution: https://github.com/YosysHQ/apicula/tree/master/examples/himbaechel

bsram.ZIP

Here are two directories that are copies except for the DPB-video-ram.v file. If you applied the yosys patch (see previous posts) and have himbaechel nextpnr installed (see https://github.com/YosysHQ/apicula/issues/220#issuecomment-1876682997) then everything should compile.

The build goes like this:

make -f Makefile.himbaechel bsram-DPB-tangnano9k.fs

And yes, I know that I use an 8-bit primitive, and yosys makes 9-bit, but at the moment this is unimportant: as long as inferring ignores CE and WRE for writing, there will be garbage on the screen.

Thank you for sharing! Would a Yosys build from source (without the patch) work as well?

Seyviour commented 10 months ago

@yrabbit

As I understand it, since the array has 11 elements ([10:0]), only 4 bits are required to represent any index into the array (log2(11)).

For a memory with a 11-bit address, the declaration would be something like: reg [7:0] mem [2**11-1 :0]

yrabbit commented 10 months ago

Thank you for sharing! Would a Yosys build from source (without the patch) work as well?

No. but you can avoid compiling yosys if after installation you find where the brams.txt and brams_map.v files are located in your system and correct it right there. installed-failes

yrabbit commented 10 months ago

@yrabbit

As I understand it, since the array has 11 elements ([10:0]), only 4 bits are required to represent any index into the array (log2(11)).

For a memory with a 11-bit address, the declaration would be something like: reg [7:0] mem [2**11-1 :0]

I told that I'm very bad at verilog :)

And thanks to @Seyviour we have working memory!!!

https://github.com/YosysHQ/yosys/assets/6075465/5747448e-229c-4150-ba4f-32fa8606e79c

It works like crap, but still! Here is a reference video from a manual primitive

https://github.com/YosysHQ/yosys/assets/6075465/368e60eb-9ff1-424b-b2cf-939c1d9c4173

Seyviour commented 10 months ago

This is amazing, @yrabbit. Thank you very much!

I'm currently trying to figure out SDRAM for the GW2A chips; do you have any tips on how I could go about getting that to work?

yrabbit commented 10 months ago

Well, the happy ending is still far away - the picture should be the same as that of a hand-made primitive.

As for SDRAM, sorry, I've never done this. Is this something external to the FPGA? Not primitive?

chili-chips-ba commented 10 months ago

... this is a unique trait of Gowin devices which have SDRAM chiplet connected to FPGA die within package (SIP).

This SDRAM is essentially external to FPGA and not a primitive. However, it is also "loosely internal", because it is consuming FPGA pins which are therefore not externally visible.

Those pins in Gowin proprietary tools don't have pin numbers. They have specific pin names which tool recognizes as "internal connections" to then, under the hood, translate to the actual FPGA pin number. RTL tapping into this in-the-package SDRAM has to use the exact pin names that Gowin tool recognizes. For back-and-forth RTL porting, opensource tools should adhere to the same naming convention.

For more, see: https://github.com/nand2mario/nestang/blob/master/src/sdram.v

chili-chips-ba commented 10 months ago

For the preassigned SDRAM pin names, see: https://github.com/nand2mario/nestang/blob/master/src/nestang_top.sv

// SDRAM // For Primer 25K: https://github.com/MiSTer-devel/Hardware_MiSTer/blob/master/releases/sdram_xsds_3.0.pdf // For Nano 20K: 8MB 32-bit SDRAM output O_sdram_clk, output O_sdram_cke, output O_sdram_cs_n, // chip select output O_sdram_cas_n, // columns address select output O_sdram_ras_n, // row address select output O_sdram_wen_n, // write enable inout [SDRAM_DATA_WIDTH-1:0] IO_sdram_dq, // bidirectional data bus output [SDRAM_ROW_WIDTH-1:0] O_sdram_addr, // multiplexed address bus output [1:0] O_sdram_ba, // two banks output [SDRAM_DATA_WIDTH/8-1:0] O_sdram_dqm,

yrabbit commented 10 months ago

If we slightly adjust the JSON that is obtained at the output of the inferring version, namely, turn on reading mode 1 and enable the output of the output register built into the memory by applying VCC to the OCEB input, then we will get a perfect match of the pictures. Which, naturally, suggests an idea, I don’t know what yet. Something in the code generated by yosys stretches the pixels by half horizontally. It is logical to assume that it is the address, but no - this would not be cured by turning on the register at the output. Hmm... unclear...

shot-2

https://github.com/YosysHQ/yosys/assets/6075465/43600b1a-bd2e-499b-b51c-76fd29a22d8d

yrabbit commented 10 months ago

I noticed one more thing - the infer and primitive versions start differently - the primitive normally handles the initial memory filling, while the infer version has a black screen. Let's deal with this first.

yrabbit commented 10 months ago

Infer version start:

https://github.com/YosysHQ/yosys/assets/6075465/deb3b1e0-a198-4310-9555-1c5ab40fd4b1

Infer version with READ_MODE=1 start:

https://github.com/YosysHQ/yosys/assets/6075465/d2241891-6c30-41ce-9c2a-8760bc5a624d

The lower video especially shows how the horizontal resolution improves dramatically when the version with READ_MODE=1 is loaded and launched into the FPGA.

It seems that with READ_MODE=0 we somehow manage to use one byte from memory to output two adjacent points, while with READ_MODE=1 each point uses its own byte from memory. But this is not a property of this parameter - it just shifts the byte output by one clock cycle. No ideas yet :(

Seyviour commented 10 months ago

Thank you, @chili-chips for your comments on the SDRAM. My apologies to you, @yrabbit for taking this long to respond; @chili-chips comments pretty much sum it up. I'd like to find the location and (hidden) pins that map to the SDRAM and figure out routing to those pins. It's probably more something to be done in Nextpnr rather than in Yosys.

Seyviour commented 10 months ago

Ah. I also noticed that the pixels were getting repeated along the horizontal but I thought it was the writes that were failing because the instantiated version used the clock enable of the BRAM, unlike the version inferred by Yosys. Evidently, I was wrong about that :) I'll look at the timing diagram from the Gowin manual to see if I find anything that might be useful w.r.t the read modes.

@yrabbit, it seems we don't have an equivalent of the include "img-video-ram.vh" statement that initializes the ram for the primitive version. I'll try to initialize via readmemh and report what I find.

yrabbit commented 10 months ago

@yrabbit, it seems we don't have an equivalent of the include "img-video-ram.vh" statement that initializes the ram for the primitive version. I'll try to initialize via readmemh and report what I find.

Well, don’t bother with that - here’s the readmemh version. It’s more interesting to understand what’s happening with the output signal :) bsram-infer.ZIP

chili-chips-ba commented 10 months ago

I'd like to find the location and (hidden) pins that map to the SDRAM and figure out routing to those pins. It's probably more something to be done in Nextpnr rather than in Yosys.

That's right. We recommend diving into the depths of the final bit-file for this example: https://wiki.sipeed.com/hardware/en/tang/tang-nano-20k/example/unbox.html

yrabbit commented 10 months ago

I was in a hurry. Apparently I'm still doing something wrong. This is how I initialize the memory:

    (* ram_style = "block" *) reg [7:0] mem[2**11-1:0];
    always @(posedge clk or posedge reset) begin
        if (reset) begin
            read_data <= 0;
        end else begin
            read_data <= mem[read_ad];
        end
    end

    always @(posedge write_clk) begin
        if (write_ce) begin
            mem[write_ad] <= write_data;
        end
    end
    initial begin
        $readmemh("img0.hex", mem);
    end

Here is the very beginning of the data file:

04
14
54
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00

Here is the entire data file: img0.hex.txt

And this is what happens in JSON after synthesis:

        "vmem.mem.0.0": {
          "hide_name": 0,
          "type": "SPX9",
          "parameters": {
            "BIT_WIDTH": "00000000000000000000000000001001",
            "BLK_SEL": "000",
            "INIT_RAM_00": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110100000010100000000100",
            "INIT_RAM_01": "000001001000001001000001001000001010000001010000001011000001100000001100000001101000001101000001110000001110000001110000001110000001111000001111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",

Of course, yosys chose a 9-bit primitive, but even the 9-bit initialization line looks strange: already from the third byte there was a discrepancy with the data file:

...000110100000010100000000100 -> 0 0011 0100 0 0001 0100 0 0000 0100 = 34 14 04

How can 34 come out of 54? I messed something up.

yrabbit commented 10 months ago

Changed the initial memory initialization bytes to

00
04
08
0C
10
14
18
1C
20
24
28
2C
30
34
38
3C
00
00
00
00
00

Result on the address inputs of the primitive and display pins (! this is important) in the READ_MODE=1 option read-mode-1

The Clock, the least significant three bits of the address, and all the green channel bits are shown. An enlarged piece, where you can see after how many clock cycles and on which edge the data is changed after the address change:

read-mode-1-0

Why is it important that the data is taken from the display pins - between the memory outputs and the display pins there are several LUTs and MUXes (which, for now, we will assume that they do not introduce delay) and one DFF, which can have some effect.

And I made a repository so as not to constantly pack these files: https://github.com/yrabbit/bram-infer

yrabbit commented 10 months ago

And here’s what happens with READ_MODE_0, again on the display pins: read-mode-0 Some thought is floating in the air, something opaque... maybe it’s all about this DFF that we have in front of the display? What if we get rid of it and just say that we already have DFF inside the primitive? I do not know yet...

Seyviour commented 10 months ago

Given the impact of the register (as dictated by the read_mode), perhaps we should consider that the design somehow doesn't meet timing when READ_MODE is 0.

Maybe you can check how the instantiated version performs when it has READ_MODE=0 :thinking:

Seyviour commented 10 months ago

I'll try to rework the address generation so that it happens on the preceding posedge rather than a negedge to see if that helps.

Seyviour commented 10 months ago

This is what I propose. Forgive the weird naming and formatting here :)

        reg         [15:0] __pixel_count;
    reg         [15:0] __line_count;
    reg         [7:0] __vmem_start_col;
    reg         [7:0] __vmem_start_row; 

    always @(*) begin
        if (pixel_count == PixelForHS) begin
                        __pixel_count      =  16'b0;
                       __line_count       =  line_count + 1'b1;
                end
                else if (line_count == LineForVS) begin
                       __line_count       =  16'b0;
                       __pixel_count      =  16'b0;
                end
                else begin
                       __pixel_count      =  pixel_count + 1'b1;
                end

              __vmem_start_col = __pixel_count - `START_X;
              __vmem_start_row = __line_count - `START_Y;
    end

    reg           [7:0] vmem_start_col;
    reg           [7:0] vmem_start_row;

    always @(posedge pixel_clk or negedge rst)begin
        if (!rst) begin
                         line_count           <=  16'b0;    
                        pixel_count           <=  16'b0;
            vmem_start_col   <=  0 - `START_X;
            vmem_start_row   <=  0 - `START_Y; 
            end
        else begin
            pixel_count              <= __pixel_count;
            line_count       <= __line_count;
            vmem_start_col   <= __vmem_start_col;
            vmem_start_row   <= __vmem_start_col; 
        end
    end

I'll create a PR on the repo you created

yrabbit commented 10 months ago

I'll create a PR on the repo you created

I'm waiting:)

yrabbit commented 10 months ago

Maybe you can check how the instantiated version performs when it has READ_MODE=0 🤔

Well, I play with it exclusively - I don’t touch the version with the primitive. The last pictures with both READ_MODE=1 and READ_MODE=0 refer to it.

Seyviour commented 10 months ago

I'll create a PR on the repo you created

I'm waiting:)

Done :)

Seyviour commented 10 months ago

Maybe you can check how the instantiated version performs when it has READ_MODE=0 🤔

Well, I play with it exclusively - I don’t touch the version with the primitive. The last pictures with both READ_MODE=1 and READ_MODE=0 refer to it.

Ohhh. I thought it would be a good idea to change the version with the primitive to READ_MODE=0 to see how it performs then.

yrabbit commented 10 months ago

I'll create a PR on the repo you created

I'm waiting:)

Done :) ![Uploading PR.jpeg…]()

mmm... this is not good - I can't orient the logic analyzer because there are so many overlays on the screen. and the goblin disappeared :(

I'll try with READ_MODE=1

Seyviour commented 10 months ago

I must have got something wrong. I'll double check and send an update.

yrabbit commented 10 months ago

nope, still PR-1

Seyviour commented 10 months ago

I think I found my mistake:

vmem_start_row <= __vmem_start_col;

Seyviour commented 10 months ago

I've submitted a new PR

yrabbit commented 10 months ago

READ_MODE=0

https://github.com/YosysHQ/yosys/assets/6075465/1beab2b9-c76a-4312-8b31-e6dcb8ac73d8

yrabbit commented 10 months ago

READ_MODE=1

https://github.com/YosysHQ/yosys/assets/6075465/96ef1fe2-3d3d-4317-8149-91a84858064c

yrabbit commented 10 months ago

I recognize by the eyes when there are byte gaps and when there are not: if the eyes consist of three horizontal dots, then there is no gap :)

And it looks like you removed the check that prohibited drawing the goblin outside the central square, now his ghosts are everywhere :)

Seyviour commented 10 months ago

Oh my. Did I affect the color of the goblins too? :)

I'll keep looking to see what I got wrong.

yrabbit commented 10 months ago

So, I can say with absolute confidence that the green color is connected directly to the memory output, there are no DFFs, LUTs or MUXs between the memory and the pin. G-wire

In light of this we have: Let me remind you that the values 0h, 4h, 8h, Ch, 10h, 14h, 18h and so on are written from address 0. The option with READ_MODE_1 is a good picture, bytes are not thrown out, edges are shown when address 1 is set and when the number 4h stored there is output: 856-read-mode-1

And now the option with READ_MODE=0 Let me remind you that I only change the line with the READ_MODE parameter in JSON, so no routes or anything else changes. The addresses are set identically along the same edges, maintained for the same number of clock cycles, but the output is who knows what. Still a mystery. 858-read-mode-0

Seyviour commented 10 months ago

So, I can say with absolute confidence that the green color is connected directly to the memory output, there are no DFFs, LUTs or MUXs between the memory and the pin. G-wire

In light of this we have: Let me remind you that the values 0h, 4h, 8h, Ch, 10h, 14h, 18h and so on are written from address 0. The option with READ_MODE_1 is a good picture, bytes are not thrown out, edges are shown when address 1 is set and when the number 4h stored there is output: 856-read-mode-1

And now the option with READ_MODE=0 Let me remind you that I only change the line with the READ_MODE parameter in JSON, so no routes or anything else changes. The addresses are set identically along the same edges, maintained for the same number of clock cycles, but the output is who knows what. Still a mystery. 858-read-mode-0

Is this connection the same for READ_MODE=0 and READ_MODE=1? I guess that might make sense since/if the output register is internal to the RAM?

Also, I'm trying to find any code that references the READ_MODE and I did not find anything particularly useful across yosys, apicula, and nextpnr. From brams.txt, I can see WRITE_MODE, but no READ_MODE.

Do you have any pointers on where I might find the code that decides READ_MODE or handles it? Maybe the answer is not in yosys :tired_face:

Seyviour commented 10 months ago

@yrabbit. This might be worth trying: With READ_MODE=0, left shift or right shift the address by 1-bit to see what happens. Probably, things get worse, but there's a small chance we observe something interesting.

yrabbit commented 10 months ago

Is this connection the same for READ_MODE=0 and READ_MODE=1? I guess that might make sense since/if the output register is internal to the RAM?

Of course it’s the same: I change READ_MODE directly in JSON after synthesis, so routing and so on are exactly the same ;)

Also, I'm trying to find any code that references the READ_MODE and I did not find anything particularly useful across yosys, apicula, and nextpnr. From brams.txt, I can see WRITE_MODE, but no READ_MODE.

Do you have any pointers on where I might find the code that decides READ_MODE or handles it? Maybe the answer is not in yosys 😫

Yeah, I think that it is not processed anywhere, but is simply set as 0 once and for all. And this is just funny - if mode 1 had been set there, then we would not have noticed this problem in real life - everything would have worked (well, with a delay of one clock, but it would have looked correct) :wink:

https://github.com/YosysHQ/yosys/blob/242ae4ef017a60c0e9231ed466e0431fc43dcfc9/techlibs/gowin/brams_map.v#L349

yrabbit commented 10 months ago

@yrabbit. This might be worth trying: With READ_MODE=0, left shift or right shift the address by 1-bit to see what happens. Probably, things get worse, but there's a small chance we observe something interesting.

Yes, it may be something related to the input lines, today I will read how this primitive is encoded into bits. I couldn’t make a cardinal mistake - in this case, not a single primitive would work at all - but I could have missed some rare feature.

yrabbit commented 10 months ago

I found it :smile: When READ_MODE=0, the OCEx inputs must be set to 1. Now the data is spit out on the next clock cycle after the address is set and no bytes are skipped. Time to brush up on PR.

1509-read-mode-0

Seyviour commented 10 months ago

Congratulations!!! Now no more half-eyed goblins :) And thanks for the shout-out on the PR :pray:

I think there might be one more thing to consider:

mmicko commented 10 months ago

Fixed by https://github.com/YosysHQ/yosys/pull/4137

yrabbit commented 10 months ago

I think there might be one more thing to consider:

  • As I understand things, the idea for the 'bypass' mode is that the read operation should be combinational, in the sense that the output should be available before the next rising edge, and the timing diagram from the manual seems to support this. How do we test that this is the case?

Well, it doesn't work like that. I made a repository with an example for Gowin IDE (https://github.com/yrabbit/gw-bsram-infer). It uses SDPB with READ_MODE=0, the outputs of the primitive are connected directly to pins without intermediate pieces. Launch result: gw

You can compile the project, but I think it’s enough to look at the intermediate verilog: https://github.com/yrabbit/gw-bsram-infer/blob/master/impl/gwsynthesis/bram-infer-gw.vg

There you can see that the outs go straight to OBUFs.

So I think we won't dig any further if their own IDE doesn't do what the documentation says :wink: