YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.31k stars 860 forks source link

Variable Part Selects #4476

Open psiddire opened 4 days ago

psiddire commented 4 days ago

Feature Description

We have identified that Yosys synthesizes very inefficiently when variable part selects are used in the input RTL.

Example

Here is an example:

module part_select3 (
    input logic clk,
    input logic rst,
    input logic [1:0] data_in,
    input logic [1:0] select,
    output logic [10:0] data_out
);

  always_ff @(posedge clk or posedge rst) begin
    if (rst) begin
      data_out <= '0;
    end else begin
      data_out[select * 3 + 3 +: 2] <= data_in;
    end
  end
endmodule

For Xilinx part, this should ideally be synthesized with 6 LUT4’s (or fewer LUT5s or LUT6s) and 6 Flip Flops.

Resource Utilization with Yosys version 0.40 However, Yosys synthesis has a lot of resource utilization. The above RTL synthesis utilizes:

   Number of cells:                 44
     CARRY4                          2
     FDCE                           11
     INV                             2
     LUT2                            4
     LUT3                            9
     LUT5                           15
     LUT6                            1
   Estimated number of LCs:         25

Feature Request

Can Yosys have a feature to improve the synthesis of such variable part select logic blocks? This will especially be crucial if the part select is much bigger and is used extensively in the input RTL.

Looking forward to hearing from you.

Thanks

povik commented 3 days ago

Please attach a Yosys version to the synthesis results you see seeing.

georgerennie commented 3 days ago

Adding -abc9 to synth_xilinx seems to improve results and for me gives the following result, although I can't replicate the results reported above. I also don't see how it is possible to synthesise this for only 6 flip flops, given data_out has to use flip flops to store values and has 11 bits. edit: I hadn't seen that not all bits get used

Number of cells:                 35
  BUFG                            1
  FDCE                           11
  IBUF                            6
  LUT4                            6
  OBUF                           11

Estimated number of LCs:          6

-abc9 is now the default for some other synthesis flows, maybe it should be for xilinx too (@Ravenslofty)

Ravenslofty commented 3 days ago

So, we have -abc9 to activate the timing-driven mapping mode, which fixes the LUT side of things. I think the flop side of things is transforming a never-written-only-reset flop into a constant driver of the reset value.

psiddire commented 3 days ago

@povik I have added the version to my question. In particular, I am using 0.40. @georgerennie -abc9 does help in reducing the LUTs as you suggested, so maybe it can be made default. However, I realize that -abc9 possibly doesn't support ultrascale fully? @Ravenslofty For the flops, is there a fix? Any possible workaround?

Ravenslofty commented 3 days ago

I think the flops might get optimised out if this code was inside other RTL. But presently, I'm not sure Yosys knows about this transformation.