f4pga / f4pga-arch-defs

FOSS architecture definitions of FPGA hardware useful for doing PnR device generation.
https://f4pga.org
ISC License
271 stars 113 forks source link

Improve xc7 fasm2verilog output #578

Open litghost opened 5 years ago

litghost commented 5 years ago

With https://github.com/SymbiFlow/symbiflow-arch-defs/pull/577, fasm2v is good enough to be useful, however it could be more useful when debugging correctness issues with some more features.

Ideas to improve output, along with difficulty:

So for example:

    (* KEEP, DONT_TOUCH, LOC = "SLICE_X22Y113", BEL = "B6LUT" *)                 
    LUT6_2 #(                                                                    
      .INIT(64'b0101010101010101010101010101010110101010101010101010101010101010)                                                                                                                                  
    ) CLBLL_R_X15Y113_SLICE_X22Y113_BLUT (                                       
      .I0(CLBLL_R_X15Y113_SLICE_X22Y113_B1),                                     
      .I1(CLBLL_R_X15Y113_SLICE_X22Y113_B2),                                     
      .I2(CLBLL_R_X15Y113_SLICE_X22Y113_B3),                                     
      .I3(CLBLL_R_X15Y113_SLICE_X22Y113_B4),                                     
      .I4(CLBLL_R_X15Y113_SLICE_X22Y113_B5),                                     
      .I5(CLBLL_R_X15Y113_SLICE_X22Y113_B6),                                     
      .O5(CLBLL_R_X15Y113_SLICE_X22Y113_BO5),                                    
      .O6(CLBLL_R_X15Y113_SLICE_X22Y113_BO6)                                     
    );               

becomes:

     // CLBLL_R_X15Y113_SLICE_X22Y113_BO5 = CLBLL_R_X15Y113_SLICE_X22Y113_B1;
     // CLBLL_R_X15Y113_SLICE_X22Y113_BO6 = CLBLL_R_X15Y113_SLICE_X22Y113_B1 ^ CLBLL_R_X15Y113_SLICE_X22Y113_B6;
    (* KEEP, DONT_TOUCH, LOC = "SLICE_X22Y113", BEL = "B6LUT" *)                 
    LUT6_2 #(                                                                    
      .INIT(64'b0101010101010101010101010101010110101010101010101010101010101010)                                                                                                                                  
     ) CLBLL_R_X15Y113_SLICE_X22Y113_BLUT (                                       
      .I0(CLBLL_R_X15Y113_SLICE_X22Y113_B1),                                     
      .I1(CLBLL_R_X15Y113_SLICE_X22Y113_B2),                                     
      .I2(CLBLL_R_X15Y113_SLICE_X22Y113_B3),                                     
      .I3(CLBLL_R_X15Y113_SLICE_X22Y113_B4),                                     
      .I4(CLBLL_R_X15Y113_SLICE_X22Y113_B5),                                     
      .I5(CLBLL_R_X15Y113_SLICE_X22Y113_B6),                                     
      .O5(CLBLL_R_X15Y113_SLICE_X22Y113_BO5),                                    
      .O6(CLBLL_R_X15Y113_SLICE_X22Y113_BO6)                                     
    );                
assign CLBLM_R_X11Y110_SLICE_X14Y110_CMUX = CLBLM_R_X11Y110_SLICE_X14Y110_C5Q;
assign CLBLL_R_X17Y120_SLICE_X26Y120_SR = CLBLM_R_X11Y110_SLICE_X14Y110_CMUX;

Becomes:

assign CLBLM_R_X11Y110_SLICE_X14Y110_CMUX = CLBLM_R_X11Y110_SLICE_X14Y110_C5Q;
assign CLBLL_R_X17Y120_SLICE_X26Y120_SR = CLBLM_R_X11Y110_SLICE_X14Y110_C5Q;

This change leads into the next change:

Net 1794 ($abc$11234$n2115)                                                    

Node:  73553   SOURCE (30,22)  Class: 81  Switch: 0                           
Node:  73648     OPIN (30,22)  Pin: 81   BLK_TI-CLBLM_L.CLBLM_M_AMUX[0] Switch: 2

And then matching the VPR rr graph source node OPIN back into a 7-series source, and then renaming the net. In the example above, the AMUX OPIN at grid location (30,22) should be named $abc$11234$n2115. AMUX OPIN at grid location (30, 20) corresponds the SLICEM located at tile CLBLM_L_X10Y128. So the fasm2verilog name in that case is CLBLM_L_X10Y128_SLICE_X12Y128_AMUX.

Getting back to VPR net names probably is a multi-step process:

  wire CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;
  assign CLBLM_L_X10Y128_SLICE_X12Y128_AMUX = CLBLM_L_X10Y128_SLICE_X12Y128_AO5;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_C6 = CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_D6 = CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;

  (* KEEP, DONT_TOUCH, LOC = "SLICE_X16Y128", BEL = "D6LUT" *)
  LUT6_2 #(
    .INIT(64'b0111010100110000111101111111001100010000111111110101000111111111)
  ) CLBLL_L_X12Y128_SLICE_X16Y128_DLUT (
    .I0(CLBLL_L_X12Y128_SLICE_X16Y128_D1),
    .I1(CLBLL_L_X12Y128_SLICE_X16Y128_D2),
    .I2(CLBLL_L_X12Y128_SLICE_X16Y128_D3),
    .I3(CLBLL_L_X12Y128_SLICE_X16Y128_D4),
    .I4(CLBLL_L_X12Y128_SLICE_X16Y128_D5),
    .I5(CLBLL_L_X12Y128_SLICE_X16Y128_D6),
    .O5(CLBLL_L_X12Y128_SLICE_X16Y128_DO5),
    .O6(CLBLL_L_X12Y128_SLICE_X16Y128_DO6)
  );

becomes

  wire CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;
  wire $abc$11234$n2115;
  assign $abc$11234$n2115 = CLBLM_L_X10Y128_SLICE_X12Y128_AO5;
  assign CLBLM_L_X10Y128_SLICE_X12Y128_AMUX = $abc$11234$n2115;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_C6 = $abc$11234$n2115;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_D6 = $abc$11234$n2115;

  (* KEEP, DONT_TOUCH, LOC = "SLICE_X16Y128", BEL = "D6LUT" *)
  LUT6_2 #(
    .INIT(64'b0111010100110000111101111111001100010000111111110101000111111111)
  ) CLBLL_L_X12Y128_SLICE_X16Y128_DLUT (
    .I0(CLBLL_L_X12Y128_SLICE_X16Y128_D1),
    .I1(CLBLL_L_X12Y128_SLICE_X16Y128_D2),
    .I2(CLBLL_L_X12Y128_SLICE_X16Y128_D3),
    .I3(CLBLL_L_X12Y128_SLICE_X16Y128_D4),
    .I4(CLBLL_L_X12Y128_SLICE_X16Y128_D5),
    .I5($abc$11234$n2115),
    .O5(CLBLL_L_X12Y128_SLICE_X16Y128_DO5),
    .O6(CLBLL_L_X12Y128_SLICE_X16Y128_DO6)
  );
mkurc-ant commented 5 years ago

@litghost Regarding "Add a comment above LUT6_2's with the equation form of the LUT from input to output. Ideally they will be reduced equations.":

You mean to write own implementation of an algorithm which converts a truth table with up to 6 inputs to a boolean equation ? Or use an already available solver for this task ? Pure python or an external library/tool ?

I would gladly go with this task, but the only solution I see right now is to use Karnaugh tables related stuff. Or do you have something else on your mind ?

litghost commented 5 years ago

@litghost Regarding "Add a comment above LUT6_2's with the equation form of the LUT from input to output. Ideally they will be reduced equations.":

You mean to write own implementation of an algorithm which converts a truth table with up to 6 inputs to a boolean equation ? Or use an already available solver for this task ? Pure python or an external library/tool ?

Yosys can perform this composition, and I believe there are python libraries too.

I would gladly go with this task, but the only solution I see right now is to use Karnaugh tables related stuff. Or do you have something else on your mind ?

That is what I was thinking, about it doesn't have to be all implemented by us.

mkurc-ant commented 5 years ago

@litghost

Yosys can perform this composition

I believe that Yosys can only do the opposite: convert boolean equation to a truth table. Not vice-versa.

I found some implementation of Karnaugh map solvers in python but they are limited to 4 inputs only. We are aiming at LUTs with up to 6 inputs (as they are in 7-series) right? A Karnaugh solver is very straightforward so I can write own implementation for 6-input LUTs.

There is also the Quine-McCluskey algorithm which will always find the optimal solution but it is too much time consuming. AFAIK works for any number of inputs. Implementations are available.

I found also a heuristic algorithm called "Espresso". There is source code available: https://ptolemy.berkeley.edu/projects/embedded/pubs/downloads/espresso/index.htm

I'll keep looking for other solutions. I guess I'll go with this task.

Espresso
litghost commented 5 years ago

In order to better compare timing information output from VPR and Vivado, it is good to annotate the BEL names with the VPR names. I'll look into improving that next. This should enable side-by-side timing path comparisons.

mithro commented 5 years ago

@mkurc-ant BTW I found the following - Generate logic expressions for all 64K boolean functions of four variables A, B, C, D

# Generate logic expressions for all 64K boolean functions of four variables A, B, C, D
# The idea is to start with simple expressions, then take all combinations of these to
# generate new expressions, then all combinations of the new expressions, etc. until
# everything has been generated.
#
# Formulas are built up as trees, e.g. ['+', 'A', 'B'] is "A or B"