YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.37k stars 871 forks source link

Better inference of the ICE40 SB_MUL16 blocks #882

Closed xobs closed 4 years ago

xobs commented 5 years ago

Background

Yosys has recently grown the ability to partially infer ICE40 DSP blocks. This works for some Verilog code, but it is not yet at the same level as the proprietary tools when it comes to inferring DSP blocks (see https://github.com/SpinalHDL/VexRiscv/issues/40#issuecomment-438609381)

Steps to reproduce the issue

The repository at https://github.com/im-tomu/fomu-tests/tree/master/mul-infer contains a VexRiscv MulPlugin that has been grafted from a VexRiscv synthesis into a standalone testbench. Two targets are provided:

Expected behavior

Yosys should infer four multiplication blocks, based on the four 16-bit multiplication lines:

  wire [15:0] execute_MulPlugin_aULow;
  wire [15:0] execute_MulPlugin_bULow;
  wire [16:0] execute_MulPlugin_aSLow;
  wire [16:0] execute_MulPlugin_bSLow;
  wire [16:0] execute_MulPlugin_aHigh;
  wire [16:0] execute_MulPlugin_bHigh;
  wire [33:0] _zz_26_;
  wire [33:0] _zz_27_;
  wire [33:0] _zz_28_;
  wire [31:0] _zz_29_;
  assign _zz_29_ = (execute_MulPlugin_aULow * execute_MulPlugin_bULow);
  assign _zz_28_ = ($signed(execute_MulPlugin_aSLow) * $signed(execute_MulPlugin_bHigh));
  assign _zz_27_ = ($signed(execute_MulPlugin_aHigh) * $signed(execute_MulPlugin_bSLow));
  assign _zz_26_ = ($signed(execute_MulPlugin_aHigh) * $signed(execute_MulPlugin_bHigh));

Actual behavior

Yosys only infers one DSP block, probably the one assigned to _zz_29_:

Info: Device utilisation:
Info:            ICESTORM_LC:  6945/ 5280   131%
Info:           ICESTORM_RAM:    30/   30   100%
Info:                  SB_IO:    12/   96    12%
Info:                  SB_GB:     8/    8   100%
Info:           ICESTORM_PLL:     1/    1   100%
Info:            SB_WARMBOOT:     0/    1     0%
Info:           ICESTORM_DSP:     1/    8    12%
Info:         ICESTORM_HFOSC:     0/    1     0%
Info:         ICESTORM_LFOSC:     0/    1     0%
Info:                 SB_I2C:     0/    2     0%
Info:                 SB_SPI:     0/    2     0%
Info:                 IO_I3C:     0/    2     0%
Info:            SB_LEDDA_IP:     0/    1     0%
Info:            SB_RGBA_DRV:     0/    1     0%
Info:         ICESTORM_SPRAM:     4/    4   100%
janrinze commented 5 years ago

Perhaps because the _zz29 is the only [15:0] * [15:0] to [31:0] and thus that's the only one that can be computed by SB_MAC16? I might be wrong here though. Currently the only way for me to get the DSP slice working is to use the SB_MAC16 in the design and not to trust automatic generation yet.

eddiehung commented 5 years ago

Really appreciate the report, thanks!

DSP inference is on my cards, but won't be able to look at it until >= April, unless a Commander Keen wants to beat me to it.

xobs commented 5 years ago

I've been looking at what iCECube does, and it uses four SB_MAC16 blocks, though one has absolutely no outputs wired up. I have something similar here, though it's unclear what its purpose is.

I've come up with the following, which works in simulation both with the Lattice SB_MAC16 block as well as the Yosys cells_sim.v model. I've thrown hundreds of thousands of random number pairs at it and compared it to an inferred multiply, and the result is the same between the two.

    reg [0:31] srca;
    reg [0:31] srcb;
    wire [0:31] out;
    wire [0:31] int_lo_lo;
    wire [0:31] int_lo_hi;
    wire [0:31] int_hi_lo;
    wire [0:31] int_hi_hi;

    SB_MAC16 #(
        .NEG_TRIGGER(1'b0),
        .C_REG(1'b0),
        .A_REG(1'b0),
        .B_REG(1'b0),
        .D_REG(1'b0),
        .TOP_8x8_MULT_REG(1'b0),
        .BOT_8x8_MULT_REG(1'b0),
        .PIPELINE_16x16_MULT_REG1(1'b0),
        .PIPELINE_16x16_MULT_REG2(1'b0),
        .TOPOUTPUT_SELECT(2'b11),
        .TOPADDSUB_LOWERINPUT(2'b00),
        .TOPADDSUB_UPPERINPUT(1'b0),
        .TOPADDSUB_CARRYSELECT(2'b00),
        .BOTOUTPUT_SELECT(2'b11),
        .BOTADDSUB_LOWERINPUT(2'b00),
        .BOTADDSUB_UPPERINPUT(1'b0),
        .BOTADDSUB_CARRYSELECT(2'b00),
        .MODE_8x8(1'b0),
        .A_SIGNED(1'b0),
        .B_SIGNED(1'b0)
    ) mul16_lo_lo (
        .A(srca[16:31]),
        .AHOLD(1'b1),
        .B(srcb[16:31]),
        .BHOLD(1'b1),
        .C(16'h0000),
        .D(16'h0000),
        .CHOLD(1'b0),
        .DHOLD(1'b0),

        .IRSTTOP                    (1'b0), //keep hold register in reset
        .IRSTBOT                    (1'b0), //keep hold register in reset
        .ORSTTOP                    (1'b0), //keep hold register in reset
        .ORSTBOT                    (1'b0), //keep hold register in reset
        .OLOADTOP                   (1'b0), //keep unused signals quiet
        .OLOADBOT                   (1'b0), //keep unused signals quiet
        .ADDSUBTOP                  (1'b0), //unused
        .ADDSUBBOT                  (1'b0), //unused
        .OHOLDTOP                   (1'b0), //keep hold register stable
        .OHOLDBOT                   (1'b0), //keep hold register stable

        .CE(1'b1),
        .CLK(1'b0),

        // .CO(1'b0), .ACCUMCO(1'b0), .SIGNEXTOUT(1'b0),
        .CI(1'b0), .ACCUMCI(1'b0), .SIGNEXTIN(1'b0),

        .O(int_lo_lo[0:31])
    );

    SB_MAC16 #(
        .NEG_TRIGGER(1'b0),
        .C_REG(1'b0),
        .A_REG(1'b0),
        .B_REG(1'b0),
        .D_REG(1'b0),
        .TOP_8x8_MULT_REG(1'b0),
        .BOT_8x8_MULT_REG(1'b0),
        .PIPELINE_16x16_MULT_REG1(1'b0),
        .PIPELINE_16x16_MULT_REG2(1'b0),
        .TOPOUTPUT_SELECT(2'b11),
        .TOPADDSUB_LOWERINPUT(2'b00),
        .TOPADDSUB_UPPERINPUT(1'b0),
        .TOPADDSUB_CARRYSELECT(2'b00),
        .BOTOUTPUT_SELECT(2'b11),
        .BOTADDSUB_LOWERINPUT(2'b00),
        .BOTADDSUB_UPPERINPUT(1'b0),
        .BOTADDSUB_CARRYSELECT(2'b00),
        .MODE_8x8(1'b0),
        .A_SIGNED(1'b0),
        .B_SIGNED(1'b0)
    ) mul16_lo_hi (
        .A(srca[16:31]),
        .AHOLD(1'b1),
        .B(srcb[0:15]),
        .BHOLD(1'b1),
        .C(16'h0000),
        .D(16'h0000),
        .CHOLD(1'b0),
        .DHOLD(1'b0),

        .IRSTTOP                    (1'b0), //keep hold register in reset
        .IRSTBOT                    (1'b0), //keep hold register in reset
        .ORSTTOP                    (1'b0), //keep hold register in reset
        .ORSTBOT                    (1'b0), //keep hold register in reset
        .OLOADTOP                   (1'b0), //keep unused signals quiet
        .OLOADBOT                   (1'b0), //keep unused signals quiet
        .ADDSUBTOP                  (1'b0), //unused
        .ADDSUBBOT                  (1'b0), //unused
        .OHOLDTOP                   (1'b0), //keep hold register stable
        .OHOLDBOT                   (1'b0), //keep hold register stable

        .CE(1'b1),
        .CLK(1'b0),

        // .CO(1'b0), .ACCUMCO(1'b0), .SIGNEXTOUT(1'b0),
        .CI(1'b0), .ACCUMCI(1'b0), .SIGNEXTIN(1'b0),
        .O(int_lo_hi[0:31])
    );

    SB_MAC16 #(
        .NEG_TRIGGER(1'b0),
        .C_REG(1'b0),
        .A_REG(1'b0),
        .B_REG(1'b0),
        .D_REG(1'b0),
        .TOP_8x8_MULT_REG(1'b0),
        .BOT_8x8_MULT_REG(1'b0),
        .PIPELINE_16x16_MULT_REG1(1'b0),
        .PIPELINE_16x16_MULT_REG2(1'b0),
        .TOPOUTPUT_SELECT(2'b11),
        .TOPADDSUB_LOWERINPUT(2'b00),
        .TOPADDSUB_UPPERINPUT(1'b0),
        .TOPADDSUB_CARRYSELECT(2'b00),
        .BOTOUTPUT_SELECT(2'b11),
        .BOTADDSUB_LOWERINPUT(2'b00),
        .BOTADDSUB_UPPERINPUT(1'b0),
        .BOTADDSUB_CARRYSELECT(2'b00),
        .MODE_8x8(1'b0),
        .A_SIGNED(1'b0),
        .B_SIGNED(1'b0)
    ) mul16_hi_lo (
        .A(srca[0:15]),
        .AHOLD(1'b1),
        .B(srcb[16:31]),
        .BHOLD(1'b1),
        .C(16'h0000),
        .D(16'h0000),
        .CHOLD(1'b0),
        .DHOLD(1'b0),
        .IRSTTOP                    (1'b0), //keep hold register in reset
        .IRSTBOT                    (1'b0), //keep hold register in reset
        .ORSTTOP                    (1'b0), //keep hold register in reset
        .ORSTBOT                    (1'b0), //keep hold register in reset
        .OLOADTOP                   (1'b0), //keep unused signals quiet
        .OLOADBOT                   (1'b0), //keep unused signals quiet
        .ADDSUBTOP                  (1'b0), //unused
        .ADDSUBBOT                  (1'b0), //unused
        .OHOLDTOP                   (1'b0), //keep hold register stable
        .OHOLDBOT                   (1'b0), //keep hold register stable

        .CE(1'b1),
        .CLK(1'b0),

        // .CO(1'b0), .ACCUMCO(1'b0), .SIGNEXTOUT(1'b0),
        .CI(1'b0), .ACCUMCI(1'b0), .SIGNEXTIN(1'b0),
        .O(int_hi_lo[0:31])
    );

    SB_MAC16 #(
        .NEG_TRIGGER(1'b0),
        .C_REG(1'b0),
        .A_REG(1'b0),
        .B_REG(1'b0),
        .D_REG(1'b0),
        .TOP_8x8_MULT_REG(1'b0),
        .BOT_8x8_MULT_REG(1'b0),
        .PIPELINE_16x16_MULT_REG1(1'b0),
        .PIPELINE_16x16_MULT_REG2(1'b0),
        .TOPOUTPUT_SELECT(2'b11),
        .TOPADDSUB_LOWERINPUT(2'b00),
        .TOPADDSUB_UPPERINPUT(1'b0),
        .TOPADDSUB_CARRYSELECT(2'b00),
        .BOTOUTPUT_SELECT(2'b11),
        .BOTADDSUB_LOWERINPUT(2'b00),
        .BOTADDSUB_UPPERINPUT(1'b0),
        .BOTADDSUB_CARRYSELECT(2'b00),
        .MODE_8x8(1'b0),
        .A_SIGNED(1'b0),
        .B_SIGNED(1'b0)
    ) mul16_hi_hi (
        .A(srca[0:15]),
        .AHOLD(1'b1),
        .B(srcb[0:15]),
        .BHOLD(1'b1),
        .C(16'h0000),
        .D(16'h0000),
        .CHOLD(1'b0),
        .DHOLD(1'b0),

        .IRSTTOP                    (1'b0), //keep hold register in reset
        .IRSTBOT                    (1'b0), //keep hold register in reset
        .ORSTTOP                    (1'b0), //keep hold register in reset
        .ORSTBOT                    (1'b0), //keep hold register in reset
        .OLOADTOP                   (1'b0), //keep unused signals quiet
        .OLOADBOT                   (1'b0), //keep unused signals quiet
        .ADDSUBTOP                  (1'b0), //unused
        .ADDSUBBOT                  (1'b0), //unused
        .OHOLDTOP                   (1'b0), //keep hold register stable
        .OHOLDBOT                   (1'b0), //keep hold register stable

        .CE(1'b1),
        .CLK(1'b0),

        // .CO(1'b0), .ACCUMCO(1'b0), .SIGNEXTOUT(1'b0),
        .CI(1'b0), .ACCUMCI(1'b0), .SIGNEXTIN(1'b0),
        .O(int_hi_hi[0:31])
    );

    reg [0:31] result;
    assign out = result;
    always @(posedge clk)
    begin
        result <= int_lo_lo[0:31] + (int_lo_hi[0:31] << 16) + (int_hi_lo[0:31] << 16);// + int_hi_hi[0:15];
    end
xobs commented 5 years ago

I don't know if this is the right approach, if it works, or if there's a better way to do it. Or if there are edge cases I'm missing.

But I've put together an example of this at https://github.com/xobs/yosys/blob/ice40-mul-infer-experiments/passes/pmgen/ice40_dsp.cc#L107

eddiehung commented 4 years ago

Should be resolved by #1359. With d963e8c2 on im-tomu/fomu-tests@c2a9dc67ea454adfea1918f24bc687763ccbc5a2's MulInfer.v:

yosys -p "synth_ice40 -dsp" MulInfer.v returns

   Number of cells:                664
     SB_CARRY                      142
     SB_DFF                        170
     SB_DFFE                         1
     SB_DFFSR                        1
     SB_LUT4                       346
     SB_MAC16                        4