Thresholds remain too large for their DataType after RoundAndClipThresholds

hleblevec commented 1 year ago

Quick summary

RoundAndClipThresholds behave so thresholds values will be clipped to 1 more or 1 less than the maximum and minimum values defined by the DataType, which may lead to an error during HLS code generation.

Details

After switching to dev branch (commit hash "fd72d48410c040ae8f715919e65ec731580235ce"), I experienced an error during the "step_hls_codegen":

Traceback (most recent call last):
  File "/home/hlebleve/finn/src/finn/builder/build_dataflow.py", line 158, in build_dataflow_cfg
    model = transform_step(model, cfg)
  File "/home/hlebleve/finn/src/finn/builder/build_dataflow_steps.py", line 488, in step_hls_codegen
    model = model.transform(PrepareIP(cfg._resolve_fpga_part(), cfg._resolve_hls_clk_period()))
  File "/home/hlebleve/finn/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "/home/hlebleve/finn/src/finn/transformation/fpgadataflow/prepare_ip.py", line 86, in apply
    _codegen_single_node(node, model, self.fpgapart, self.clk)
  File "/home/hlebleve/finn/src/finn/transformation/fpgadataflow/prepare_ip.py", line 53, in _codegen_single_node
    inst.code_generation_ipgen(model, fpgapart, clk)
  File "/home/hlebleve/finn/src/finn/custom_op/fpgadataflow/hlscustomop.py", line 288, in code_generation_ipgen
    self.generate_params(model, path)
  File "/home/hlebleve/finn/src/finn/custom_op/fpgadataflow/thresholding_batch.py", line 445, in generate_params
    self.make_weight_file(thresholds, "hls_header", weight_filename)
  File "/home/hlebleve/finn/src/finn/custom_op/fpgadataflow/thresholding_batch.py", line 350, in make_weight_file
    assert np.vectorize(tdt.allowed)(
AssertionError: Thresholds can't be expressed with type UINT4

So this happens because at this point in the flow I end up with a MultiThreshold layer whose threshold values are set to UINT4, and are ranged between 1 and 16, with 16 being impossible to write on UINT4. I found that this situation was created during streamlining, by the RoundAndClipThresholds transformation. Since the this node has a UINT4 input, the datatype of the threshold values is changed and they are rounded to the nearest larger integer. In the code of the transformation, at some point the values are clipped to the data type's range plus one: https://github.com/Xilinx/finn/blob/fd72d48410c040ae8f715919e65ec731580235ce/src/finn/transformation/streamline/round_thresholds.py#L63

For UINT4, this means the values are set to be between -1 and 16, with both extremes being impossible to express on UINT4, since idtype.min() returns 0 and idtype.max() returns 15. I know this transformation is not new so my guess is something changed recently with the way data type's work, unless I am missing something and there is actually a reason to have values clipped to their data type's range plus one ?

Possible fix

Clipping large thresholds to input range instead of input range plus range corrected the issue in my case.

I can provide ONNX models of different steps if you think it's relevant.

fpjentzsch commented 1 year ago

I also don't understand why thresholds are clipped to the input range +/- 1 here.

Maybe this was not noticed earlier because minimize_accumulator_width() was previously part of the convert_to_hls transformations, before it was moved to the new step_minimize_bit_width() builder step 6 months ago. This function clips the thresholds according to the minimized accumulator width for the MVAU, and increases the threshold datatype width beyond the input width in cases like this for the thresholding_batch CustomOp.

auphelia commented 1 year ago

Hi @hleblevec , Thanks for filing this issue, it is something we look into currently. Until further experiments are done, I would suggest changing the datatype assignment a few lines below to a bigger datatype: model.set_tensor_datatype(n.input[1], idtype). Like @fpjentzsch mentioned, the minimize_accumulator_width got moved, so this might have led to the error you're observing. When the bug is addressed, I will update this issue. Are you executing the step_minimize_bit_width in your builder flow?

hleblevec commented 1 year ago

Hi @auphelia and @fpjentzsch ,

Thanks for your answers. I was not using the step_minimize_bit_width before. I included it in my build and I don't see the error anymore. I guess that solves the problem, but it may still be useful to wonder if this clipping is still relevant as it is.

Xilinx / finn