YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.49k stars 892 forks source link

Change in behavior results in failed BRAM inference #2473

Closed sylefeb closed 3 years ago

sylefeb commented 3 years ago

Summary

I observe a change in behavior in Yosys that results in failed BRAM inference, between an older and the most recent version (version details below). I include at the end a repro code, which I tried to make small.

The issue seems to come from how constant values applied to a port enable are treated. What this example does is to either use a constant wire (assigned to 0) or a flip-flop that cannot be removed by synthesis (due to lack of initialization).

In older versions using a constant wire has the desirable effect of simplifying the BRAM. In the latest version the constant wire leads to either a port in async or a non-recognized BRAM as below. Note that if the flip-flop is given an initial value of 0, yosys will remove it which also leads to a failed inference.

Steps to reproduce the issue

Using the attached example, commenting out USE_FLIPFLOP_ON_WENABLE at the top triggers the issue in the latest version (0.9+3743 git sha1 13a27055).

Expected behavior

Ideally as with version Yosys 0.9+3537 (git sha1 c75d8c74), where both cases would lead to an inferred BRAM, and using a constant wire on a port would lead to the BRAM being simplified (dual => simple dual => single => rom).

Actual behavior

When assigning a constant wire to the wenable port, the BRAM inference fails.

Details

Disabling USE_FLIPFLOP_ON_WENABLE the BRAM is accepted:

Processing top.__main.__mem__mem.buffer:
  Properties: ports=1 bits=2048 rports=1 wports=0 dbits=8 abits=8 words=256
...
 Checking rule #6 for bram type $__ECP5_DP16KD (variant 5):
    Bram geometry: abits=14 dbits=1 wports=0 rports=0
    Estimated number of duplicates for more read ports: dups=1
    Metrics for $__ECP5_DP16KD: awaste=16128 dwaste=0 bwaste=16128 waste=16128 efficiency=1
    Rule for bram type $__ECP5_DP16KD (variant 5) rejected: requirement 'attribute syn_romstyle="ebr" ...' not met.
  Selecting best of 4 rules:
    Efficiency for rule 4.3: efficiency=6, cells=2, acells=1
    Efficiency for rule 4.2: efficiency=11, cells=1, acells=1
    Efficiency for rule 4.1: efficiency=11, cells=1, acells=1
    Efficiency for rule 1.1: efficiency=11, cells=1, acells=1
    Selected rule 4.2 with efficiency 11.
    Mapping to bram type $__ECP5_DP16KD (variant 2):
      Read port #0 is in clock domain \clk_25mhz.
        Mapped to bram port B1.1.
      Creating $__ECP5_DP16KD cell at grid position <0 0 0>: __main.__mem__mem.buffer.0.0.0

Enabling USE_FLIPFLOP_ON_WENABLE the BRAM is accepted:

Processing top.__main.__mem__mem.buffer:
  Properties: ports=2 bits=2048 rports=1 wports=1 dbits=8 abits=8 words=256
...
 Checking rule #6 for bram type $__ECP5_DP16KD (variant 5):
    Bram geometry: abits=14 dbits=1 wports=0 rports=0
    Estimated number of duplicates for more read ports: dups=1
    Metrics for $__ECP5_DP16KD: awaste=16128 dwaste=0 bwaste=16128 waste=16128 efficiency=1
    Rule #6 for bram type $__ECP5_DP16KD (variant 5) rejected: requirement 'max wports 0' not met.
  Selecting best of 4 rules:
    Efficiency for rule 4.3: efficiency=6, cells=2, acells=1
    Efficiency for rule 4.2: efficiency=11, cells=1, acells=1
    Efficiency for rule 4.1: efficiency=11, cells=1, acells=1
    Efficiency for rule 1.1: efficiency=11, cells=1, acells=1
    Selected rule 4.2 with efficiency 11.
    Mapping to bram type $__ECP5_DP16KD (variant 2):
      Shuffle bit order to accommodate enable buckets of size 9..
      Results of bit order shuffling: 0 1 2 3 4 5 6 7 -1
      Write port #0 is in clock domain \clk_25mhz.
        Mapped to bram port A1.
      Read port #0 is in clock domain \clk_25mhz.
        Mapped to bram port B1.1.
      Creating $__ECP5_DP16KD cell at grid position <0 0 0>: __main.__mem__mem.buffer.0.0.0

Disabling USE_FLIPFLOP_ON_WENABLE the BRAM is rejected:

Processing top.__main.__mem__mem.buffer:
  Properties: ports=2 bits=2048 rports=1 wports=1 dbits=8 abits=8 words=256
...
  Rule #6 for bram type $__ECP5_DP16KD (variant 4) rejected: requirement 'max wports 0' not met.
  Checking rule #6 for bram type $__ECP5_DP16KD (variant 5):
    Bram geometry: abits=14 dbits=1 wports=0 rports=0
    Estimated number of duplicates for more read ports: dups=1
    Metrics for $__ECP5_DP16KD: awaste=16128 dwaste=0 bwaste=16128 waste=16128 efficiency=1
    Rule #6 for bram type $__ECP5_DP16KD (variant 5) rejected: requirement 'max wports 0' not met.
  No acceptable bram resources found.

Enabling USE_FLIPFLOP_ON_WENABLE the BRAM is accepted:

Processing top.__main.__mem__mem.buffer:
  Properties: ports=2 bits=2048 rports=1 wports=1 dbits=8 abits=8 words=256
...
 Checking rule #6 for bram type $__ECP5_DP16KD (variant 5):
    Bram geometry: abits=14 dbits=1 wports=0 rports=0
    Estimated number of duplicates for more read ports: dups=1
    Metrics for $__ECP5_DP16KD: awaste=16128 dwaste=0 bwaste=16128 waste=16128 efficiency=1
    Rule #6 for bram type $__ECP5_DP16KD (variant 5) rejected: requirement 'max wports 0' not met.
  Selecting best of 4 rules:
    Efficiency for rule 4.3: efficiency=6, cells=2, acells=1
    Efficiency for rule 4.2: efficiency=11, cells=1, acells=1
    Efficiency for rule 4.1: efficiency=11, cells=1, acells=1
    Efficiency for rule 1.1: efficiency=11, cells=1, acells=1
    Selected rule 4.2 with efficiency 11.
    Mapping to bram type $__ECP5_DP16KD (variant 2):
      Shuffle bit order to accommodate enable buckets of size 9..
      Results of bit order shuffling: 0 1 2 3 4 5 6 7 -1
      Write port #0 is in clock domain \clk_25mhz.
        Mapped to bram port A1.
      Read port #0 is in clock domain \clk_25mhz.
        Mapped to bram port B1.1.
      Creating $__ECP5_DP16KD cell at grid position <0 0 0>: __main.__mem__mem.buffer.0.0.0

Code

// comment / uncomment to see how BRAM synthesis fails or works
// `define USE_FLIPFLOP_ON_WENABLE

// Yosys 0.9+3537 : both cases lead to an inferred BRAM
// Yosys 0.9+3743 : not defining USE_FLIPFLOP_ON_WENABLE leads to the BRAM being rejected

// BRAM

module M_main_mem_mem(
input                  [0:0] in_mem_wenable,
input      signed [7:0]      in_mem_wdata,
input                  [7:0] in_mem_addr,
output reg signed [7:0]      out_mem_rdata,
input                        clock
);

  reg signed [7:0] buffer[255:0];

  always @(posedge clock) begin
    if (in_mem_wenable) begin
      buffer[in_mem_addr] <= in_mem_wdata;
    end
    out_mem_rdata <= buffer[in_mem_addr];
  end

endmodule

// Main

module M_main (out_leds,clock,reset);

  output  [7:0] out_leds;
  input clock;
  input reset;

  wire signed [7:0] _w_mem_mem_rdata;

  `ifdef USE_FLIPFLOP_ON_WENABLE
  reg    _d_mem_wenable,_q_mem_wenable;
  `else
  wire   _c_mem_wenable;
  assign _c_mem_wenable = 0;
  `endif

  wire signed [7:0] _c_mem_wdata;
  assign _c_mem_wdata = 0;

  reg  [7:0] _d_mem_addr;
  reg  [7:0] _q_mem_addr;
  reg  [7:0] _d_leds,_q_leds;
  reg  _d_index,_q_index;
  assign out_leds = _q_leds;

  always @(posedge clock) begin
    if (reset) begin
      _q_mem_addr <= 0;
      _q_index <= 0;
    end else begin
  `ifdef USE_FLIPFLOP_ON_WENABLE
      _q_mem_wenable <= _d_mem_wenable;
  `endif
      _q_mem_addr <= _d_mem_addr;
      _q_leds <= _d_leds;
      _q_index <= _d_index;
    end
  end

  M_main_mem_mem __mem__mem(
  .clock(clock),
  `ifdef USE_FLIPFLOP_ON_WENABLE
    .in_mem_wenable(_d_mem_wenable),
  `else
    .in_mem_wenable(_c_mem_wenable),
  `endif
  .in_mem_wdata(_c_mem_wdata),
  .in_mem_addr(_d_mem_addr),
  .out_mem_rdata(_w_mem_mem_rdata)
  );

  always @* begin

  `ifdef USE_FLIPFLOP_ON_WENABLE
  _d_mem_wenable = _q_mem_wenable;
  `endif
  _d_mem_addr    = _q_mem_addr;
  _d_leds        = _q_leds;
  _d_index       = _q_index;

  (* full_case *)
  case (_q_index)
    0: begin
      // var inits
      _d_mem_addr = 0;
      `ifdef USE_FLIPFLOP_ON_WENABLE
      _d_mem_wenable = 0;
      `endif
      // --
      _d_mem_addr = 0;
      _d_index = 1;
    end
    1: begin
      _d_leds = _w_mem_mem_rdata;
      _d_mem_addr = _q_mem_addr+1;
      _d_index = 1;
    end
  endcase
  end

endmodule

// TOP

module top(
  output [7:0] leds,
  input  clk_25mhz
  );

  reg reset = 1;
  always @(posedge clk_25mhz) begin
    reset <= 0;
  end

  M_main __main(
    .out_leds      (leds),
    .reset         (reset),
    .clock         (clk_25mhz)
  );

endmodule

(made minor edits for typos)

Ravenslofty commented 3 years ago

This sounds like it might be a dfflegalize interaction, although I haven't looked in any detail. I'll ping @mwkmwkmwk anyway.

mwkmwkmwk commented 3 years ago

This is a memory_dff bug uncovered by an interaction with opt_dffopt_dff basically optimizes away the address register on the write port into a const driver, and then memory_dff is unable to recognize a synchronous write port. It goes downhill from there.

mwkmwkmwk commented 3 years ago

This should be fixed now that #2605 has landed — can you check it?

sylefeb commented 3 years ago

I confirm this is fixed in Yosys 0.9+4008 (git sha1 26e01a67, gcc 10.2.0 -fPIC -Os) - thanks!