Open bakhtiarZ opened 3 months ago
Hi. It looks like that the tool is complaining about dimension mismatch. It should be straightforward to fix?
I would suggest to fix this and use the latest testbench stuff.
Hi,
I think the issue is due to fixed_linear RTL not supporting an output parallelism that is less than the input parallelism:
Given this test case:
{ "DATA_IN_0_TENSOR_SIZE_DIM_0": 20, "DATA_IN_0_PARALLELISM_DIM_0": 20, "WEIGHT_TENSOR_SIZE_DIM_0": 20, "WEIGHT_TENSOR_SIZE_DIM_1": 20, "WEIGHT_PARALLELISM_DIM_0": 20, "DATA_OUT_0_TENSOR_SIZE_DIM_0": 20, "DATA_OUT_0_PARALLELISM_DIM_0": 10, "BIAS_TENSOR_SIZE_DIM_0": 20, }, The fixed linear test bench fails.
Also when setting unroll_out_c = unroll_kernel_out in the convolution_tb (these are the input and output parallelisms within the fixed linear instance) the test passes. Shall I continue with this constraint? It will affect only affect performance.
I think the issue is due to fixed_linear RTL not supporting an output parallelism that is less than the input parallelism:
Okay I got this. fix_linear
is fine.
First, the output parallelism can be smaller than input parallelism - there is no direct relation between these two parameters.
However, if you look into the source code or doc of fixed_linear
, you might realize that DATA_OUT_0_PARALLELISM_DIM_0
must be the same as WEIGHT_PARALLELISM_DIM_0
to match the best throughput inside this component. This is why the tool is complaining.
Particularly for this case, you need to either set DATA_OUT_0_PARALLELISM_DIM_0
to 20 or set WEIGHT_PARALLELISM_DIM_0
to 10. Basically, if you want to slow down the output of this layer, you might also want to slow down its relevant input to balance the throughput and avoid resource waste.
This constraint: DATA_OUT_0_PARALLELISM_DIM_0 must be the same as WEIGHT_PARALLELISM_DIM_0
causes issues inside the convolution.sv. The weights go into an input buffer and then are outputted with this parallelism: [UNROLL_KERNEL_OUT * UNROLL_OUT_C
. This is essentially a flattened weight matrix 'block'.
This is previously what I meant by issues from '1D to 2D paramaters'. Also after more testing it looks like the convolution_tb only passes in one specific case and this is not true: 'when setting unroll_out_c = unroll_kernel_out in the convolution_tb (these are the input and output parallelisms within the fixed linear instance) the test passes. '
I am not sure how to fix this, I think to fix it the weights may have to be unpacked to then be passed to the fixed linear, what do you think? This change could also break other features.
One solution I thought to do is to use the old fixed linear file and continue my development but then I ran into the node not under scope errors.
This constraint:
DATA_OUT_0_PARALLELISM_DIM_0 must be the same as WEIGHT_PARALLELISM_DIM_0
causes issues inside the convolution.sv. The weights go into an input buffer and then are outputted with this parallelism:[UNROLL_KERNEL_OUT * UNROLL_OUT_C
. This is essentially a flattened weight matrix 'block'.
A few questions before I understand your question:
UNROLL_KERNEL_OUT' and what is
UNROLL_OUT_C'?fixed_linear
or the parameters of convolution
? They might have the parameters with the same name but mean different things.fixed_linear
as your component, build convolution
on top of that and face this problem, it is possible that the design of convolution
needs to be changed - please double check the fixed_linear
source to confirm if you understand the design methodology.This is previously what I meant by issues from '1D to 2D paramaters'. Also after more testing it looks like the convolution_tb only passes in one specific case and this is not true: 'when setting unroll_out_c = unroll_kernel_out in the convolution_tb (these are the input and output parallelisms within the fixed linear instance) the test passes. '
Sorry... I don't follow...
I am not sure how to fix this, I think to fix it the weights may have to be unpacked to then be passed to the fixed linear, what do you think? This change could also break other features.
If you flatten the weight in the convolution
block and stream it into fixed_linear
, I think that is fine.
Unroll Kernel Out is a parameter that defines a new parallelism level for the input data stream. It is the result of extracting an image kernel from a data stream. Unroll out c is the parameter that defines the output parallelism in the channel dimension. A short description of the effect of this is that it controls how many output pixels are calculated per cycle.
Both of these parameters refer to the convolution block.
My main component is the convolution. I have now made a workaround by implementing the old fixed linear and fixing some bugs to do with the interfacing of the two modules. I think this is an acceptable solution but what do you think? I am not fully aware of why the fixed linear was changed, did it have bugs? If so then reusing the old one will not be a suitable solution. But for now the convolution is passing the testbenches once again.
@pgimenes Any insights?
Question: What is the necessary fix?
Commit hash: d49620d78de8e6b85016b880b144b6214146be4d [on the main branch]
Before running the below command you must edit deps.py at machop/mase_components/deps.py and replace line 63 with this "conv/convolution": ["conv", "linear", "common", "fixed_arithmetic", "cast"], The effect of this is to add the cast directories so it can find the fixed rounding module.
Command to reproduce:
cd
make sync
make shell
pip3 install mpmath==1.3.0
cd machop/mase_components/conv/
python3 test/convolution_tb.py
Output: