Convolution TB is not currently working and the RTL has compilation errors.

bakhtiarZ commented 3 months ago

Question: What is the necessary fix?

Commit hash: d49620d78de8e6b85016b880b144b6214146be4d [on the main branch]

Before running the below command you must edit deps.py at machop/mase_components/deps.py and replace line 63 with this "conv/convolution": ["conv", "linear", "common", "fixed_arithmetic", "cast"], The effect of this is to add the cast directories so it can find the fixed rounding module.

Command to reproduce:

cd make sync make shell pip3 install mpmath==1.3.0 cd machop/mase_components/conv/ python3 test/convolution_tb.py

Output:


/workspace/machop/mase_cocotb/runner.py:9: UserWarning: Python runners and associated APIs are an experimental feature and subject to change.
  from cocotb.runner import get_runner, get_results
%Warning-SELRANGE: /workspace/machop/mase_components/linear/rtl/fixed_linear.sv:188:37: Selection index out of range: 2 outside 1:0
                                                                                      : ... note: In instance 'convolution.fl_instance'
  188 |           .data_out      (data_out_0[i])
      |                                     ^
                   /workspace/machop/mase_components/conv/rtl/convolution.sv:178:1: ... note: In file included from 'convolution.sv'
                   ... For warning description see https://verilator.org/warn/SELRANGE?v=5.020
                   ... Use "/* verilator lint_off SELRANGE */" and lint_on around source to disable this message.
%Warning-SELRANGE: /workspace/machop/mase_components/linear/rtl/fixed_linear.sv:188:37: Selection index out of range: 3 outside 1:0
                                                                                      : ... note: In instance 'convolution.fl_instance'
  188 |           .data_out      (data_out_0[i])
      |                                     ^
                   /workspace/machop/mase_components/conv/rtl/convolution.sv:178:1: ... note: In file included from 'convolution.sv'
%Error: /workspace/machop/mase_components/conv/rtl/convolution.sv:180:15: Slices of arrays in assignments have different unpacked dimensions, 16 versus 8
                                                                        : ... note: In instance 'convolution'
  180 |       .weight(ib_weight),
      |               ^~~~~~~~~
%Error: /workspace/machop/mase_components/linear/rtl/fixed_linear.sv:167:19: Slices of arrays in assignments have different unpacked dimensions, 4 versus 2
                                                                           : ... note: In instance 'convolution.fl_instance'
  167 |         .data_in (bias),
      |                   ^~~~
        /workspace/machop/mase_components/conv/rtl/convolution.sv:178:1: ... note: In file included from 'convolution.sv'
-Info: Command Line disabled gate optimization with -fno-gate.  This may cause ordering problems.
%Error: Exiting due to 2 error(s)
x =  tensor([[[[247, 240,  22],
          [240,   8, 238],
          [233, 234, 211],
          [243, 212, 253]],

         [[252,  27, 222],
          [  0, 226,   7],
          [ 25,  26, 246],
          [ 92, 181, 254]]]], dtype=torch.int32)
weight =  Parameter containing:
tensor([[[[-0.0022,  0.1549],
          [-0.2376, -0.2124],
          [-0.1112,  0.0774]],

         [[-0.0057,  0.2289],
          [-0.0256,  0.0764],
          [-0.0872, -0.0567]]],

        [[[-0.2758, -0.1912],
          [-0.1190,  0.0107],
          [ 0.1141,  0.1732]],

         [[-0.1957, -0.1257],
          [ 0.1049,  0.2397],
          [-0.0594,  0.2160]]],

        [[[-0.0465,  0.0305],
          [ 0.2614, -0.2678],
          [-0.1817, -0.0731]],

         [[-0.1125,  0.2494],
          [-0.1871, -0.1329],
          [-0.2017, -0.2704]]],

        [[[-0.1685,  0.2481],
          [ 0.1288,  0.1399],
          [ 0.0152, -0.1480]],

         [[ 0.0488, -0.2695],
          [-0.2086, -0.1488],
          [ 0.1821,  0.1693]]]], requires_grad=True)
bias =  Parameter containing:
tensor([-0.1280, -0.0104,  0.1846,  0.2870], requires_grad=True)
weight =  tensor([[[[  0,   1],
          [254, 254],
          [255,   1]],

         [[  0,   2],
          [  0,   1],
          [255,   0]]],

        [[[254, 254],
          [255,   0],
          [  1,   1]],

         [[254, 255],
          [  1,   2],
          [  0,   2]]],

        [[[  0,   0],
          [  2, 254],
          [255, 255]],

         [[255,   2],
          [255, 255],
          [254, 254]]],

        [[[255,   2],
          [  1,   1],
          [  0, 255]],

         [[  0, 254],
          [254, 255],
          [  1,   1]]]], dtype=torch.int32)
bias =  tensor([255,   0,   1,   2], dtype=torch.int32)
tensor([[[[63,  0],
          [63, 59],
          [ 9,  1]],

         [[63, 60],
          [ 3, 55],
          [13,  4]],

         [[ 2,  1],
          [ 0,  1],
          [ 0, 61]],

         [[ 1, 63],
          [ 2, 16],
          [52,  5]]]], dtype=torch.int32)
/workspace/machop/mase_components/conv/test/build/convolution
##########################################
#### TEST 0 : {'DATA_WIDTH': 8, 'DATA_FRAC_WIDTH': 3, 'W_WIDTH': 8, 'W_FRAC_WIDTH': 3, 'BIAS_WIDTH': 8, 'BIAS_FRAC_WIDTH': 3, 'OUT_WIDTH': 6, 'OUT_FRAC_WIDTH': 2, 'IN_X': 3, 'IN_Y': 4, 'IN_C': 2, 'KERNEL_X': 2, 'KERNEL_Y': 3, 'OUT_C': 4, 'UNROLL_OUT_C': 2, 'UNROLL_IN_C': 2, 'UNROLL_KERNEL_OUT': 4, 'SLIDING_NUM': 6, 'STRIDE': 2, 'PADDING_Y': 2, 'PADDING_X': 1}
##########################################
INFO: Running command perl /usr/local/bin/verilator -cc --exe -Mdir /workspace/machop/mase_components/conv/test/build/convolution/test_0 -DCOCOTB_SIM=1 --top-module convolution --vpi --public-flat-rw --prefix Vtop -o convolution -LDFLAGS '-Wl,-rpath,/usr/local/lib/python3.11/dist-packages/cocotb/libs -L/usr/local/lib/python3.11/dist-packages/cocotb/libs -lcocotbvpi_verilator' -Wno-GENUNNAMED -Wno-WIDTHEXPAND -Wno-WIDTHTRUNC -Wno-UNOPTFLAT -prof-c --stats --trace -O0 -build-jobs 8 -Wno-fatal -Wno-lint -Wno-style -I/workspace/machop/mase_components/conv/rtl -I/workspace/machop/mase_components/linear/rtl -I/workspace/machop/mase_components/common/rtl -I/workspace/machop/mase_components/fixed_arithmetic/rtl -I/workspace/machop/mase_components/cast/rtl -GDATA_WIDTH=8 -GDATA_FRAC_WIDTH=3 -GW_WIDTH=8 -GW_FRAC_WIDTH=3 -GBIAS_WIDTH=8 -GBIAS_FRAC_WIDTH=3 -GOUT_WIDTH=6 -GOUT_FRAC_WIDTH=2 -GIN_X=3 -GIN_Y=4 -GIN_C=2 -GKERNEL_X=2 -GKERNEL_Y=3 -GOUT_C=4 -GUNROLL_OUT_C=2 -GUNROLL_IN_C=2 -GUNROLL_KERNEL_OUT=4 -GSLIDING_NUM=6 -GSTRIDE=2 -GPADDING_Y=2 -GPADDING_X=1 /usr/local/lib/python3.11/dist-packages/cocotb/share/lib/verilator/verilator.cpp /workspace/machop/mase_components/conv/rtl/convolution.sv in directory /workspace/machop/mase_components/conv/test/build/convolution/test_0
Process 'perl' terminated with error 1

jianyicheng commented 3 months ago

Hi. It looks like that the tool is complaining about dimension mismatch. It should be straightforward to fix?

I would suggest to fix this and use the latest testbench stuff.

bakhtiarZ commented 3 months ago

Hi,

I think the issue is due to fixed_linear RTL not supporting an output parallelism that is less than the input parallelism:

Given this test case:

{ "DATA_IN_0_TENSOR_SIZE_DIM_0": 20, "DATA_IN_0_PARALLELISM_DIM_0": 20, "WEIGHT_TENSOR_SIZE_DIM_0": 20, "WEIGHT_TENSOR_SIZE_DIM_1": 20, "WEIGHT_PARALLELISM_DIM_0": 20, "DATA_OUT_0_TENSOR_SIZE_DIM_0": 20, "DATA_OUT_0_PARALLELISM_DIM_0": 10, "BIAS_TENSOR_SIZE_DIM_0": 20, }, The fixed linear test bench fails.

Also when setting unroll_out_c = unroll_kernel_out in the convolution_tb (these are the input and output parallelisms within the fixed linear instance) the test passes. Shall I continue with this constraint? It will affect only affect performance.

jianyicheng commented 3 months ago

I think the issue is due to fixed_linear RTL not supporting an output parallelism that is less than the input parallelism:

Okay I got this. fix_linear is fine.

First, the output parallelism can be smaller than input parallelism - there is no direct relation between these two parameters.

However, if you look into the source code or doc of fixed_linear, you might realize that DATA_OUT_0_PARALLELISM_DIM_0 must be the same as WEIGHT_PARALLELISM_DIM_0 to match the best throughput inside this component. This is why the tool is complaining.

Particularly for this case, you need to either set DATA_OUT_0_PARALLELISM_DIM_0 to 20 or set WEIGHT_PARALLELISM_DIM_0 to 10. Basically, if you want to slow down the output of this layer, you might also want to slow down its relevant input to balance the throughput and avoid resource waste.

bakhtiarZ commented 3 months ago

This constraint: DATA_OUT_0_PARALLELISM_DIM_0 must be the same as WEIGHT_PARALLELISM_DIM_0 causes issues inside the convolution.sv. The weights go into an input buffer and then are outputted with this parallelism: [UNROLL_KERNEL_OUT * UNROLL_OUT_C. This is essentially a flattened weight matrix 'block'.

This is previously what I meant by issues from '1D to 2D paramaters'. Also after more testing it looks like the convolution_tb only passes in one specific case and this is not true: 'when setting unroll_out_c = unroll_kernel_out in the convolution_tb (these are the input and output parallelisms within the fixed linear instance) the test passes. '

I am not sure how to fix this, I think to fix it the weights may have to be unpacked to then be passed to the fixed linear, what do you think? This change could also break other features.

bakhtiarZ commented 3 months ago

One solution I thought to do is to use the old fixed linear file and continue my development but then I ran into the node not under scope errors.

jianyicheng commented 3 months ago

This constraint: DATA_OUT_0_PARALLELISM_DIM_0 must be the same as WEIGHT_PARALLELISM_DIM_0 causes issues inside the convolution.sv. The weights go into an input buffer and then are outputted with this parallelism: [UNROLL_KERNEL_OUT * UNROLL_OUT_C. This is essentially a flattened weight matrix 'block'.

A few questions before I understand your question:

What is UNROLL_KERNEL_OUT' and what isUNROLL_OUT_C'?
Are you talking about the parameters of fixed_linear or the parameters of convolution? They might have the parameters with the same name but mean different things.
If you are using fixed_linear as your component, build convolution on top of that and face this problem, it is possible that the design of convolution needs to be changed - please double check the fixed_linear source to confirm if you understand the design methodology.

This is previously what I meant by issues from '1D to 2D paramaters'. Also after more testing it looks like the convolution_tb only passes in one specific case and this is not true: 'when setting unroll_out_c = unroll_kernel_out in the convolution_tb (these are the input and output parallelisms within the fixed linear instance) the test passes. '

Sorry... I don't follow...

I am not sure how to fix this, I think to fix it the weights may have to be unpacked to then be passed to the fixed linear, what do you think? This change could also break other features.

If you flatten the weight in the convolution block and stream it into fixed_linear, I think that is fine.

bakhtiarZ commented 3 months ago

Unroll Kernel Out is a parameter that defines a new parallelism level for the input data stream. It is the result of extracting an image kernel from a data stream. Unroll out c is the parameter that defines the output parallelism in the channel dimension. A short description of the effect of this is that it controls how many output pixels are calculated per cycle.
Both of these parameters refer to the convolution block.
My main component is the convolution. I have now made a workaround by implementing the old fixed linear and fixing some bugs to do with the interfacing of the two modules. I think this is an acceptable solution but what do you think? I am not fully aware of why the fixed linear was changed, did it have bugs? If so then reusing the old one will not be a suitable solution. But for now the convolution is passing the testbenches once again.

jianyicheng commented 3 months ago

@pgimenes Any insights?

DeepWok / mase

Convolution TB is not currently working and the RTL has compilation errors. #66