Improve xc7 fasm2verilog output

litghost commented 5 years ago

With https://github.com/SymbiFlow/symbiflow-arch-defs/pull/577, fasm2v is good enough to be useful, however it could be more useful when debugging correctness issues with some more features.

Ideas to improve output, along with difficulty:

[ ] (easy) Combine carry chain inputs and outputs into wire bus for easier viewing in during simulation. This is a pretty straight forward verilog parsing task, basically join the various CARRY4's that are connected via the chain and emit 4 wire buses, S, DI, CO and O for the entire chain length.
[ ] (easy to medium) Add a comment above LUT6_2's with the equation form of the LUT from input to output. Ideally they will be reduced equations.

So for example:

    (* KEEP, DONT_TOUCH, LOC = "SLICE_X22Y113", BEL = "B6LUT" *)                 
    LUT6_2 #(                                                                    
      .INIT(64'b0101010101010101010101010101010110101010101010101010101010101010)                                                                                                                                  
    ) CLBLL_R_X15Y113_SLICE_X22Y113_BLUT (                                       
      .I0(CLBLL_R_X15Y113_SLICE_X22Y113_B1),                                     
      .I1(CLBLL_R_X15Y113_SLICE_X22Y113_B2),                                     
      .I2(CLBLL_R_X15Y113_SLICE_X22Y113_B3),                                     
      .I3(CLBLL_R_X15Y113_SLICE_X22Y113_B4),                                     
      .I4(CLBLL_R_X15Y113_SLICE_X22Y113_B5),                                     
      .I5(CLBLL_R_X15Y113_SLICE_X22Y113_B6),                                     
      .O5(CLBLL_R_X15Y113_SLICE_X22Y113_BO5),                                    
      .O6(CLBLL_R_X15Y113_SLICE_X22Y113_BO6)                                     
    );

becomes:

     // CLBLL_R_X15Y113_SLICE_X22Y113_BO5 = CLBLL_R_X15Y113_SLICE_X22Y113_B1;
     // CLBLL_R_X15Y113_SLICE_X22Y113_BO6 = CLBLL_R_X15Y113_SLICE_X22Y113_B1 ^ CLBLL_R_X15Y113_SLICE_X22Y113_B6;
    (* KEEP, DONT_TOUCH, LOC = "SLICE_X22Y113", BEL = "B6LUT" *)                 
    LUT6_2 #(                                                                    
      .INIT(64'b0101010101010101010101010101010110101010101010101010101010101010)                                                                                                                                  
     ) CLBLL_R_X15Y113_SLICE_X22Y113_BLUT (                                       
      .I0(CLBLL_R_X15Y113_SLICE_X22Y113_B1),                                     
      .I1(CLBLL_R_X15Y113_SLICE_X22Y113_B2),                                     
      .I2(CLBLL_R_X15Y113_SLICE_X22Y113_B3),                                     
      .I3(CLBLL_R_X15Y113_SLICE_X22Y113_B4),                                     
      .I4(CLBLL_R_X15Y113_SLICE_X22Y113_B5),                                     
      .I5(CLBLL_R_X15Y113_SLICE_X22Y113_B6),                                     
      .O5(CLBLL_R_X15Y113_SLICE_X22Y113_BO5),                                    
      .O6(CLBLL_R_X15Y113_SLICE_X22Y113_BO6)                                     
    );

[ ] (easy) Merge simple nets, leaving the intermediate wires, but avoiding the chain. So

assign CLBLM_R_X11Y110_SLICE_X14Y110_CMUX = CLBLM_R_X11Y110_SLICE_X14Y110_C5Q;
assign CLBLL_R_X17Y120_SLICE_X26Y120_SR = CLBLM_R_X11Y110_SLICE_X14Y110_CMUX;

Becomes:

assign CLBLM_R_X11Y110_SLICE_X14Y110_CMUX = CLBLM_R_X11Y110_SLICE_X14Y110_C5Q;
assign CLBLL_R_X17Y120_SLICE_X26Y120_SR = CLBLM_R_X11Y110_SLICE_X14Y110_C5Q;

This change leads into the next change:

[x] (more difficult) Name nets with same name that VPR used during routing. These names can be found in the route output from VPR. This involves consuming the VPR route output which looks like:

Net 1794 ($abc$11234$n2115)                                                    

Node:  73553   SOURCE (30,22)  Class: 81  Switch: 0                           
Node:  73648     OPIN (30,22)  Pin: 81   BLK_TI-CLBLM_L.CLBLM_M_AMUX[0] Switch: 2

And then matching the VPR rr graph source node OPIN back into a 7-series source, and then renaming the net. In the example above, the AMUX OPIN at grid location (30,22) should be named $abc$11234$n2115. AMUX OPIN at grid location (30, 20) corresponds the SLICEM located at tile CLBLM_L_X10Y128. So the fasm2verilog name in that case is CLBLM_L_X10Y128_SLICE_X12Y128_AMUX.

Getting back to VPR net names probably is a multi-step process:

[x] Create a python library for reading in the routing output, and finding the source OPIN information
[x] Create a mapping from source OPIN to fasm2verilog net names. The net format is of the form {tile}_{site}_{site pin}. The required map is fasm2verilog net name -> VPR net name.
[x] Bind the map into fasm2verilog by adding an argument that consumes the route file output. It would be good to keep both net names. So this:

  wire CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;
  assign CLBLM_L_X10Y128_SLICE_X12Y128_AMUX = CLBLM_L_X10Y128_SLICE_X12Y128_AO5;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_C6 = CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_D6 = CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;

  (* KEEP, DONT_TOUCH, LOC = "SLICE_X16Y128", BEL = "D6LUT" *)
  LUT6_2 #(
    .INIT(64'b0111010100110000111101111111001100010000111111110101000111111111)
  ) CLBLL_L_X12Y128_SLICE_X16Y128_DLUT (
    .I0(CLBLL_L_X12Y128_SLICE_X16Y128_D1),
    .I1(CLBLL_L_X12Y128_SLICE_X16Y128_D2),
    .I2(CLBLL_L_X12Y128_SLICE_X16Y128_D3),
    .I3(CLBLL_L_X12Y128_SLICE_X16Y128_D4),
    .I4(CLBLL_L_X12Y128_SLICE_X16Y128_D5),
    .I5(CLBLL_L_X12Y128_SLICE_X16Y128_D6),
    .O5(CLBLL_L_X12Y128_SLICE_X16Y128_DO5),
    .O6(CLBLL_L_X12Y128_SLICE_X16Y128_DO6)
  );

becomes

  wire CLBLM_L_X10Y128_SLICE_X12Y128_AMUX;
  wire $abc$11234$n2115;
  assign $abc$11234$n2115 = CLBLM_L_X10Y128_SLICE_X12Y128_AO5;
  assign CLBLM_L_X10Y128_SLICE_X12Y128_AMUX = $abc$11234$n2115;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_C6 = $abc$11234$n2115;
  assign CLBLL_L_X12Y128_SLICE_X16Y128_D6 = $abc$11234$n2115;

  (* KEEP, DONT_TOUCH, LOC = "SLICE_X16Y128", BEL = "D6LUT" *)
  LUT6_2 #(
    .INIT(64'b0111010100110000111101111111001100010000111111110101000111111111)
  ) CLBLL_L_X12Y128_SLICE_X16Y128_DLUT (
    .I0(CLBLL_L_X12Y128_SLICE_X16Y128_D1),
    .I1(CLBLL_L_X12Y128_SLICE_X16Y128_D2),
    .I2(CLBLL_L_X12Y128_SLICE_X16Y128_D3),
    .I3(CLBLL_L_X12Y128_SLICE_X16Y128_D4),
    .I4(CLBLL_L_X12Y128_SLICE_X16Y128_D5),
    .I5($abc$11234$n2115),
    .O5(CLBLL_L_X12Y128_SLICE_X16Y128_DO5),
    .O6(CLBLL_L_X12Y128_SLICE_X16Y128_DO6)
  );

mkurc-ant commented 5 years ago

@litghost Regarding "Add a comment above LUT6_2's with the equation form of the LUT from input to output. Ideally they will be reduced equations.":

You mean to write own implementation of an algorithm which converts a truth table with up to 6 inputs to a boolean equation ? Or use an already available solver for this task ? Pure python or an external library/tool ?

I would gladly go with this task, but the only solution I see right now is to use Karnaugh tables related stuff. Or do you have something else on your mind ?

litghost commented 5 years ago

@litghost Regarding "Add a comment above LUT6_2's with the equation form of the LUT from input to output. Ideally they will be reduced equations.":

You mean to write own implementation of an algorithm which converts a truth table with up to 6 inputs to a boolean equation ? Or use an already available solver for this task ? Pure python or an external library/tool ?

Yosys can perform this composition, and I believe there are python libraries too.

I would gladly go with this task, but the only solution I see right now is to use Karnaugh tables related stuff. Or do you have something else on your mind ?

That is what I was thinking, about it doesn't have to be all implemented by us.

mkurc-ant commented 5 years ago

@litghost

Yosys can perform this composition

I believe that Yosys can only do the opposite: convert boolean equation to a truth table. Not vice-versa.

I found some implementation of Karnaugh map solvers in python but they are limited to 4 inputs only. We are aiming at LUTs with up to 6 inputs (as they are in 7-series) right? A Karnaugh solver is very straightforward so I can write own implementation for 6-input LUTs.

There is also the Quine-McCluskey algorithm which will always find the optimal solution but it is too much time consuming. AFAIK works for any number of inputs. Implementations are available.

I found also a heuristic algorithm called "Espresso". There is source code available: https://ptolemy.berkeley.edu/projects/embedded/pubs/downloads/espresso/index.htm

I'll keep looking for other solutions. I guess I'll go with this task.

Espresso

litghost commented 5 years ago

In order to better compare timing information output from VPR and Vivado, it is good to annotate the BEL names with the VPR names. I'll look into improving that next. This should enable side-by-side timing path comparisons.

mithro commented 5 years ago

@mkurc-ant BTW I found the following - Generate logic expressions for all 64K boolean functions of four variables A, B, C, D

# Generate logic expressions for all 64K boolean functions of four variables A, B, C, D
# The idea is to start with simple expressions, then take all combinations of these to
# generate new expressions, then all combinations of the new expressions, etc. until
# everything has been generated.
#
# Formulas are built up as trees, e.g. ['+', 'A', 'B'] is "A or B"

f4pga / f4pga-arch-defs

Improve xc7 fasm2verilog output #578