Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
723 stars 230 forks source link

Accuracy Discrepancies between the built Accelerator (68%) on ZCU102 and Brevitas Model (86%) #996

Open shakeelakram00 opened 7 months ago

shakeelakram00 commented 7 months ago

ZCU102: PYNQ Linux, based on Ubuntu 18.04 (GNU/Linux 4.19.0-xilinx-v2019.1 a) FINN: v0.9 Xilinx tools: 2022.2 Ubuntu: 22.04.1 LTS Start the docker container with the command: ./run-docker.sh notebook

commit b3bdff118ae076cb776af6e51ddc28eeaa0d6390 (HEAD -> main, origin/main, origin/HEAD) Merge: cdc5ec4b 9847528a Author: auphelia [56755897+auphelia@users.noreply.github.com](mailto:56755897%2Bauphelia@users.noreply.github.com) Date: Mon Feb 13 11:55:42 2023 +0000

Merge pull request #762 from Xilinx/fix/nb_tests

Fix known issues for release

commit 9847528a8430fb6bf00826845de74fbe4a1a596d Author: auphelia [jakobapk@web.de](mailto:jakobapk@web.de) Date: Mon Feb 13 11:52:15 2023 +0000

[Notebooks/Tests] Fix typo in nb and fix build_dataflow test

commit cdc5ec4b0dde59d5d8de0a5359aae529816376af (tag: v0.9) Merge: 41740ed1 17af0c35 Author: auphelia [56755897+auphelia@users.noreply.github.com](mailto:56755897%2Bauphelia@users.noreply.github.com) Date: Fri Feb 10 12:00:49 2023 +0000

Merge pull request #760 from Xilinx/dev

Summary I have been working with the cnv_end2end_example and successfully modified it to build the Accelerator on a different dataset. The brevitas model was trained on a dataset with a shape of 1x1x14x14, dtype torch.float32 and values ranging between 0 and 1.

Following the cnv_end2end_example, the first layer that exists does the quantization and the ONNX conversion includes pre-processing (ToTensor(), i.e., division by 255 for normalization UINT8 inputs to FLOAT [0,1]) and post-processing (TopK=1). The ONNX model, after create_dataflow_partition, provides all the blocks converted into HLS_Layers, except the initial Transpose.

Given that the first Transpose was not converted to an HLS layer, and the accelerator works with a dataset of shape 1x14x14x1 and dtype UINT8, I scaled the original float32 dataset to np.uint8 (dataset*255.astype(np.uint8))) for inference on ZCU102. Though the generated validated file includes reshaping the data to the desired shape, I tried to input the data by reshaping and without reshaping. The results were the same in both cases i.e. 68%.

During the built process I included the verification steps that show the successful results again sample input and expected output even the built accelerator presents the correct output for the sample input but for the overall dataset, the accuracy drops to 68% rather than 86%. And even for the whole dataset After performing the Initial Tidyup Transformations below, the accuracy of the brevitas model exported to ONNX gives 86% accuracy.

Initial Tidyup Transformation: bo.export_finn_onnx(brevitas_model, (1, 1, 14, 14), "export.onnx"); model = ModelWrapper("export.onnx") model = model.transform(InferShapes()) model = model.transform(FoldConstants()) ... output_dict = oxe.execute_onnx(model_t, input_dict)

Accelerator Built Steps After Brevitas Model Exported to onnx import brevitas.onnx as bo bo.export_finn_onnx(model, (1, 1, 14, 14), "export.onnx");

from finn.util.pytorch import ToTensor from qonnx.transformation.merge_onnx_models import MergeONNXModels from qonnx.core.modelwrapper import ModelWrapper from qonnx.core.datatype import DataType from qonnx.transformation.insert_topk import InsertTopK import finn.builder.build_dataflow as build def custom_step_add_post_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):     model = model.transform(InsertTopK(k=1))     return model

def custom_step_add_pre_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):     ishape = model.get_tensor_shape(model.graph.input[0].name)     # preprocessing: torchvision's ToTensor divides uint8 inputs by 255     preproc = ToTensor()     bo.export_finn_onnx(preproc, ishape, "preproc.onnx", opset_version=11)     preproc_model = ModelWrapper("preproc.onnx")     # set input finn datatype to UINT8     preproc_model.set_tensor_datatype(preproc_model.graph.input[0].name, DataType["UINT8"])     # merge pre-processing onnx model with cnv model (passed as input argument)     model = model.transform(MergeONNXModels(preproc_model))     return model

import finn.builder.build_dataflow as build import finn.builder.build_dataflow_config as build_cfg import os import shutil

model_file = "export.onnx"

rtlsim_output_dir = "output"

Delete previous run results if exist

if os.path.exists(rtlsim_output_dir):     shutil.rmtree(rtlsim_output_dir)     print("Previous run results deleted!")

cfg_stitched_ip = build.DataflowBuildConfig(     output_dir          = rtlsim_output_dir,     mvau_wwidth_max     = 160,     synth_clk_period_ns = 20.0,     target_fps          = 2000000,     board               = "ZCU102",     fpga_part           = "xczu9eg-ffvb1156-2-e",     shell_flow_type     = build_cfg.ShellFlowType.VIVADO_ZYNQ,

    folding_two_pass_relaxation = True,     folding_config_file = "auto_folding_config.json",     steps=[custom_step_add_pre_proc,            custom_step_add_post_proc,            "step_qonnx_to_finn",            "step_tidy_up",            "step_streamline",            "step_convert_to_hls",            "step_create_dataflow_partition",            "step_target_fps_parallelization",            "step_apply_folding_config",            "step_generate_estimate_reports",            "step_hls_codegen",            "step_hls_ipgen",            "step_set_fifo_depths",            "step_create_stitched_ip",            "step_measure_rtlsim_performance",            "step_out_of_context_synthesis",            "step_synthesize_bitfile",            "step_make_pynq_driver",            "step_deployment_package",           ],     generate_outputs=[         build_cfg.DataflowOutputType.ESTIMATE_REPORTS,         build_cfg.DataflowOutputType.STITCHED_IP,         build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,         build_cfg.DataflowOutputType.OOC_SYNTH,         build_cfg.DataflowOutputType.BITFILE,         build_cfg.DataflowOutputType.PYNQ_DRIVER,         build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,     ],     verify_steps=[         build_cfg.VerificationStepType.QONNX_TO_FINN_PYTHON,         build_cfg.VerificationStepType.TIDY_UP_PYTHON,         build_cfg.VerificationStepType.STREAMLINED_PYTHON,         build_cfg.VerificationStepType.FOLDED_HLS_CPPSIM,         build_cfg.VerificationStepType.STITCHED_IP_RTLSIM,     ] )

build.build_dataflow_cfg(model_file, cfg_stitched_ip)

Moreover, Runtime_writeable_weights are enabled (set to 1) in the .json file for MVAU of CNV and Linear Layers, following the guidelines in 4_advanced_builder_settings and cnv-w1a1_folding_config.

I would appreciate any assistance in debugging this issue.

@fpjentzsch, you mentioned in your reply #995 that reshaping alone might not be sufficient. Could you please provide further guidance, considering my specific setup, to achieve the desired accuracy on the accelerator?

Thank you in advance for your help.

shakeelakram00 commented 6 months ago

Hi there, I've been diligently verifying each stage of FINN Flow for the above query, and I've run into a perplexing issue that I could use some guidance on.

Initially, during the ONNX execution, I achieved a commendable accuracy of 86% after applying tidy-up transformations, pre and post-processing transformations. However, upon proceeding with the streamline transformations, I encountered a significant drop in accuracy to 68%. This drop persisted when deploying the model onto an FPGA.

To give you a clearer picture, here are the streamline transformations I've implemented: model = model.transform(MoveScalarLinearPastInvariants()) model = model.transform(Streamline()) model = model.transform(LowerConvsToMatMul()) model = model.transform(MakeMaxPoolNHWC()) model = model.transform(Streamline()) model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold()) model = model.transform(ConvertBipolarMatMulToXnorPopcount()) model = model.transform(Streamline()) model = model.transform(absorb.AbsorbScalarMulAddIntoTopK()) model = model.transform(InferDataLayouts()) model = model.transform(RemoveUnusedTensors())

I also tried the finn.builder.build_dataflow, it still showed the same issue i.e. when streamline transformations are applied there is a drop in accuracy.

Only when I take "model = model.transform(LowerConvsToMatMul())" this trasnformation off, I get the same 86% accuracy. And I know to convert the model to hls-compatible node we have to convert convs to matmul and we need this transformation. And the only difference other than this I see with and without transformation multithreshold_1 and multithreshold_2 finn_datatype are Binaray (with LowerConvsToMatMul: giving an accuracy of 68%), and are Bipolar (without LowerConvsToMatMul: giving an accuracy of 86%) respectively.

I'm at a loss as to why this transformation is causing such a significant accuracy drop. Is it due to the Multithreshold finn_dtypes or even Kernel Size i.e 6x6 I am using in quantconv2d? Any insights or suggestions you could offer would be greatly appreciated.

Thank you for your time and assistance.

auphelia commented 6 months ago

Hi @shakeelakram00, could you try the latest release with your flow? Note that you will need to change your flow for the new structure, this blog post might be helpful: https://github.com/Xilinx/finn/discussions/1020 Great, that you were able to narrow down the problem even further. If the error persists, could you put a minimal example together to reproduce your error?

shakeelakram00 commented 6 months ago

Hi @auphelia , I really appreciate your response. Thanks, a lot. I somehow managed to sort out the error by changing the quantidentity layer's bitwidth to 2, associated with the convolution layers in cnv_end2end_example and kept the bitwitdth to 1 associated with the linear layers. The result has produced the same accuracy i.e 86% now after the transformations. The error I suppose was due to the zero padding, that is when the transformations were applied that changed the datatypes to binary from the bipolar, hence giving the accuracy drop.

But moving forward when I apply the following Partitioning, Conversion to HW Layers and Folding transformations I get an error AssertionError: MultiThreshold_3: Signed output requires actval less than 0. which I suppose was due to the multithreshold_3 generated for the quantidentity layer associated with last convolution layer before linear layers. So, I tried to update the attributes of that node by making out_bias == -1.0 manually in the onnx generated after streamline transformations. This got me rid of the error but dropped the accuracy to even down to 51%, which I suppose is due to the force change of out_bias.

So what you suggest, if I have the same convolution layers with same quantidentity layer associated with them except the second conv has an additional layer maxpool, why the MultiThreshold_3 doesn't automatically takes the out_bias = -1.0 as MultiThreshold_2 and 1 does when all three of them are coming from the same quantidentity layer. """""" self.conv_features.append(QuantIdentity( act_quant=CommonActQuant, bit_width=8, min_val=- 1.0, max_val=1.0 - 2.0 (-7), narrow_range=True, restrict_scaling_type=RestrictValueType.POWER_OF_TWO)) for out_ch, is_pool_enabled in CNV_OUT_CH_POOL: self.conv_features.append(QuantConv2d(kernel_size=KERNEL_SIZE, in_channels=in_ch, out_channels=out_ch, bias=True, padding=4, weight_quant=CommonWeightQuant, weight_bit_width=weight_bit_width)) in_ch = out_ch self.conv_features.append(BatchNorm2d(in_ch, eps=1e-4)) self.conv_features.append(QuantIdentity(act_quant=CommonActQuant,bit_width=2))#MultiThreshold123** if is_pool_enabled: self.conv_features.append(MaxPool2d(kernel_size=2)) """""" Secondly, do I have to keep all the quantidentity layers with the same bitwidth except the first one which is 8. or should I make the bitwidth == 1 for the quantidientity layer associated with convolution layer 3.

import finn.transformation.fpgadataflow.convert_to_hw_layers as to_hw from finn.transformation.fpgadataflow.create_dataflow_partition import ( CreateDataflowPartition, ) from finn.transformation.move_reshape import RemoveCNVtoFCFlatten from finn.transformation.fpgadataflow.specialize_layers import SpecializeLayers from qonnx.custom_op.registry import getCustomOp from qonnx.transformation.infer_data_layouts import InferDataLayouts

model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_streamlined.onnx") model = model.transform(to_hw.InferBinaryMatrixVectorActivation()) model = model.transform(to_hw.InferQuantizedMatrixVectorActivation()) model = model.transform(to_hw.InferLabelSelectLayer()) model = model.transform(to_hw.InferThresholdingLayer()) model = model.transform(to_hw.InferConvInpGen()) model = model.transform(to_hw.InferStreamingMaxPool()) model = model.transform(RemoveCNVtoFCFlatten()) model = model.transform(absorb.AbsorbConsecutiveTransposes()) model = model.transform(InferDataLayouts()) parent_model = model.transform(CreateDataflowPartition()) parent_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx") sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0] sdp_node = getCustomOp(sdp_node) dataflow_model_filename = sdp_node.get_nodeattr("model") dataflow_model = ModelWrapper(dataflow_model_filename) dataflow_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx")