Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
681 stars 218 forks source link

High mvau_wwidth_max value cause step_hw_ipgen to fail #1042

Open pbk20191 opened 2 months ago

pbk20191 commented 2 months ago

Prerequisites

dev branch: e188b4c50955105717b223862c4e26e4777852ea

Quick summary

High mvau_wwidth_max cause step_hw_ipgen to fail, so only very low value of mvau_wwidth_max is valid for configuration option

Details

I have my simple mnist cnn model, and it is impossible to build with high performance configuration. I confirmed that it's resource requirement is satisfied, by checking estimation report.

Below is the detail stacktrace

Building dataflow accelerator from brevitas_cnn2.onnx
Intermediate outputs will be generated in /tmp/finn_dev_pbk
Final outputs will be generated in output_final
Build log is at output_final/build_dataflow.log
Running step: custom_step_add_pre_proc [1/21]
Running step: custom_step_add_post_proc [2/21]
Running step: step_qonnx_to_finn [3/21]
Running step: step_tidy_up [4/21]
Running step: step_streamline [5/21]
Running step: step_convert_to_hw [6/21]
Running step: step_create_dataflow_partition [7/21]
Running step: step_specialize_layers [8/21]
Running step: step_target_fps_parallelization [9/21]
Running step: step_apply_folding_config [10/21]
Running step: step_minimize_bit_width [11/21]
Running step: step_generate_estimate_reports [12/21]
Running step: step_hw_codegen [13/21]
Running step: step_hw_ipgen [14/21]
Traceback (most recent call last):
  File "/home/pbk/git-projects/embedded-social-infra/finn/src/finn/builder/build_dataflow.py", line 158, in build_dataflow_cfg
    model = transform_step(model, cfg)
  File "/home/pbk/git-projects/embedded-social-infra/finn/src/finn/builder/build_dataflow_steps.py", line 573, in step_set_fifo_depths
    model = model.transform(
  File "/home/pbk/git-projects/embedded-social-infra/finn/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "/home/pbk/git-projects/embedded-social-infra/finn/src/finn/transformation/fpgadataflow/set_fifo_depths.py", line 301, in apply
    model = model.transform(InsertFIFO(create_shallow_fifos=True))
  File "/home/pbk/git-projects/embedded-social-infra/finn/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "/home/pbk/git-projects/embedded-social-infra/finn/src/finn/transformation/fpgadataflow/insert_fifo.py", line 115, in apply
    fld_shape = n0.get_folded_output_shape()
  File "/home/pbk/git-projects/embedded-social-infra/finn/src/finn/custom_op/fpgadataflow/streamingdatawidthconverter.py", line 124, in get_folded_output_shape
    dummy_t = dummy_t.reshape(new_shape)
ValueError: cannot reshape array of size 784 into shape (1,7,7,0,28)
Running step: step_set_fifo_depths [15/21]
> /home/pbk/git-projects/embedded-social-infra/finn/src/finn/custom_op/fpgadataflow/streamingdatawidthconverter.py(124)get_folded_output_shape()
    122         new_shape.append(int(ochannels // oelems))
    123         new_shape.append(oelems)
--> 124         dummy_t = dummy_t.reshape(new_shape)
    125 
    126         return dummy_t.shape

Steps to Reproduce

from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.core.datatype import DataType
import finn.builder.build_dataflow as build
from qonnx.transformation.insert_topk import InsertTopK
import onnx
import torch
from pathlib import Path
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

def custom_step_add_pre_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
    ishape = model.get_tensor_shape(model.graph.input[0].name)
    # preprocessing: torchvision's ToTensor divides uint8 inputs by 255
    preproc = ToTensor()
    bo.export_qonnx(preproc, torch.randn(ishape), "preproc.onnx", opset_version=12)
    preproc_model = ModelWrapper("preproc.onnx")
    # set input finn datatype to UINT8
    preproc_model.set_tensor_datatype(preproc_model.graph.input[0].name, DataType["UINT8"])
    # merge pre-processing onnx model with cnv model (passed as input argument)
    model = model.transform(MergeONNXModels(preproc_model))
    return model

def custom_step_add_post_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
    model = model.transform(InsertTopK(k=1))
    return model

model_file = "brevitas_cnn2.onnx"
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

final_output_dir = "output_final"

# Delete previous run results if exist
if os.path.exists(final_output_dir):
    shutil.rmtree(final_output_dir)
    print("Previous run results deleted!")

cfg = build.DataflowBuildConfig(
    output_dir          = final_output_dir,
    mvau_wwidth_max     = 10000,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    board               = "Pynq-Z1",
    shell_flow_type     = build_cfg.ShellFlowType.VIVADO_ZYNQ,
    steps               = [custom_step_add_pre_proc, custom_step_add_post_proc] + build_cfg.default_build_dataflow_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.BITFILE,
        build_cfg.DataflowOutputType.PYNQ_DRIVER,
        build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
    ]
)

finally run this inside finn jupyter

%%time
build.build_dataflow_cfg(model_file, cfg)

Expected behavior

I confirmed that the resource requirement is satisfied, so it should not fail at this step or more detailed error raised

Actual behavior

StreamingDataWidthConverter's get_folded_output_shape fails with ValueError: cannot reshape array of size 784 into shape (1,7,7,0,28)

Possible fix

If I provide pretty low value to mvau_wwidth_max (like 24) it works without error.

Additional context

mnist dataset trained onnx model brevitas_cnn2.zip

fpjentzsch commented 2 months ago

Hi, generally, errors like this can occur because the current automatic folding transformation is not perfect and might produce an illegal configuration where SIMD and PE settings between layers do not match all requirements (e.g. PE of one layer datawidth-convertible to SIMD of next layer).

You could try with a manual folding config or dig deeper how the shape (1,7,7,0,28) comes about. Especially the 0 dimension is very odd.

Ba1tu3han commented 2 months ago

I have the same issue with automatic (basic). I use the notebook with a custom ONNX file that I use CNV network and generate by Brevitas. I am sure that not to exceed the Pynq-Z2 hw resources.

image

I wrote this message to just inform you and I'm following the issue.

As you recommend, I'll try it with a custom folding custom configuration.

Edit: I have solved the issue by using a custom folding configuration. You should consider the HW resource usage by estimation step and the folding constraints in the documentation before compiling the stitch IP. @pbk20191