Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
760 stars 243 forks source link

Preprocessing Quant : InferThresholdingLayer Not infering Threshholding #1174

Closed 0BAB1 closed 2 months ago

0BAB1 commented 2 months ago

Quick summary

_tmp_finn_dev_rootmin_video_streamlined_merged_and_ready onnx

running this code

import finn.transformation.fpgadataflow.convert_to_hw_layers as to_hw

# TO HW LAYERS

model = ModelWrapper("/tmp/finn_dev_rootmin/video_streamlined_merged_and_ready.onnx")
model = model.transform(to_hw.InferLabelSelectLayer())
model = model.transform(to_hw.InferChannelwiseLinearLayer())
model = model.transform(to_hw.InferQuantizedMatrixVectorActivation())
model = model.transform(to_hw.InferThresholdingLayer())
model.save("/tmp/finn_dev_rootmin/video_hw.onnx")
showInNetron("/tmp/finn_dev_rootmin/video_hw.onnx")

On this model does not convert MultiThreshold into Thresholding layer :

_tmp_finn_dev_rootmin_video_hw onnx

Things I tried :

running model = model.transform(to_hw.InferThresholdingLayer()) transformation before all others, all multi thresholds in the model were indeed converted to hw layers, except the problematic first one...

here are the first MultiThreshold layer's attributes :

image

My model

As I'm still trying to figure thing out and tinkering around, here is the simple model I'm trying this on :

from torch.nn import Module
import torch.nn.functional as F

import brevitas.nn as qnn
from brevitas.quant import Int32Bias

class QuantWeightActBiasLeNet(Module):
    def __init__(self):
        super(QuantWeightActBiasLeNet, self).__init__()
        self.quant_inp = qnn.QuantIdentity(bit_width=4, return_quant_tensor=True)
        self.fc1   = qnn.QuantLinear(28*28, 128, bias=True, weight_bit_width=4, bias_quant=Int32Bias)
        self.relu1 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc2   = qnn.QuantLinear(128, 10, bias=True, weight_bit_width=4, bias_quant=Int32Bias)
        self.relu2 = qnn.QuantReLU(bit_width=4)

    def forward(self, x):
        out = self.quant_inp(x)
        out = self.relu1(self.fc1(out))
        out = self.relu2(self.fc2(out))
        return out

brevitas_model = QuantWeightActBiasLeNet()

It is fully quantized, my other "not fully quantized" (only weights) models ran better, maybe the quantIdentity quantizer is the problem ? Idk

0BAB1 commented 2 months ago

By removing the quant identity layer, things turned out fine.

Are we suppose to do this input quant via manual pre-processing ?

Please let me know if so (or if i misunderstood something, which is veeeery likeky).

best regards.

auphelia commented 2 months ago

Hi @0BAB1 ,

From an initial look, your observation is right. I assume that you try to do the input quantization with a MultiThresholding layer and so the input to that layer is floating-point. We don't have the support for this scenario yet in FINN and assume an integer input to the accelerator, that is most likely why you were not able to convert that layer. This is a scenario we're actively working on adding the support for, but for now the pre-processing would need to be done on the host. If you are working with image data, you might be able to do something similar like we do for the image classification networks. You can check out the advanced builder settings tutorial for details (in the custom step section).

0BAB1 commented 2 months ago

Hi @0BAB1 ,

From an initial look, your observation is right. I assume that you try to do the input quantization with a MultiThresholding layer and so the input to that layer is floating-point. We don't have the support for this scenario yet in FINN and assume an integer input to the accelerator, that is most likely why you were not able to convert that layer. This is a scenario we're actively working on adding the support for, but for now the pre-processing would need to be done on the host. If you are working with image data, you might be able to do something similar like we do for the image classification networks. You can check out the advanced builder settings tutorial for details (in the custom step section).

Hello @auphelia , thanks for the response !

I see...

So does the model only expect to run on integers ? Meaning We have to fully-quantize models in brevitas (as opposed to only quantizing the weights / inputs) ?

0BAB1 commented 2 months ago

Hello @auphelia ,

After reading the FINN and FINN-R papers, i noticed this paragraph :

image

So if i use FINN in the case where i trained my model using FP32 data, Will it automatically apply transformations so i can feed the same data but quantized in INT8 ? Or should i train my model with INT8 out of the gate in Brevitas ?

I am confused out these mixed data types usages and struggles to find resources talking about this (and also how is the quant-dequant handled for partially quantized models ? What datatype flow though the model between layers : FP or INT ?)

I am trying to "deepen" my understanding of this as I am implementing on a non-supported Zynq board in Vivado/Vitis, meaning I have to feed correct datatypes during training AND inference for this to work as expected.

Best regard

0BAB1 commented 2 months ago

update : After inference, the output does not match the labels at all, I will try some things out

0BAB1 commented 2 months ago

Update 2 :

Added preprocessing baked into the model as so :

from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.datatype import DataType
from brevitas.export import export_qonnx
from qonnx.util.cleanup import cleanup as qonnx_cleanup
import torch
from finn.transformation.qonnx.convert_qonnx_to_finn import ConvertQONNXtoFINN

# PRE PROC : NONE
model = ModelWrapper("/tmp/finn_dev_rootmin/tidy.onnx")

global_inp_name = model.graph.input[0].name
ishape = model.get_tensor_shape(global_inp_name)
# preprocessing: torchvision's ToTensor divides uint8 inputs by 255
totensor_pyt = ToTensor()
export_qonnx(totensor_pyt, torch.randn(ishape), "/tmp/finn_dev_rootmin/preproc.onnx")
qonnx_cleanup("/tmp/finn_dev_rootmin/preproc.onnx", out_file="/tmp/finn_dev_rootmin/preproc.onnx")
pre_model = ModelWrapper("/tmp/finn_dev_rootmin/preproc.onnx")
pre_model = pre_model.transform(ConvertQONNXtoFINN())

# join preprocessing and core model
model = model.transform(MergeONNXModels(pre_model))
# add input quantization annotation: UINT8 for all BNN-PYNQ models
global_inp_name = model.graph.input[0].name
model.set_tensor_datatype(global_inp_name, DataType["UINT8"])

model.save("/tmp/finn_dev_rootmin/full_preproc.onnx")
showInNetron("/tmp/finn_dev_rootmin/full_preproc.onnx")

inspired by : https://github.com/Xilinx/finn/blob/main/notebooks/end2end_example/bnn-pynq/tfc_end2end_example.ipynb

(adding preprocessing section) I will keep you updated on how it works. I'm doing this as my C programs can only send UINT8 values as this is all the stitched IP accepts.

0BAB1 commented 2 months ago

After running the inference on Zynq PL, it turns out the output is pretty much random... I am now pretty out of ideas on where this whole thing goes wrong ?

Here is my vivado block design if relevant :

image

It turns out it runs really well, after inspection on ILA, everything goes as expected, it's just the output are really not the ones expected. (~10% accuracy, meaning it's just some lucky guesses haha).

Anyway, looking forward to reading your insights on this situation, Best regards and have a good rest of your day.

0BAB1 commented 2 months ago

Hello again,

After looking through examples, i used python verification in order to assess my model behavior and accuracy at each step of the hardware layer conversion process. This allowed me to make modification to correct the model, making the inference work at last.

Here is the notebook that really helped me : https://github.com/Xilinx/finn/blob/main/notebooks/end2end_example/bnn-pynq/tfc_end2end_verification.ipynb

Even though verification might seem like the "not funny" part, overlooking it cost me hours ! Don't try this at home !

auphelia commented 2 months ago

Hi @0BAB1, Really happy to hear that! You might also want to have a look at this: we have the verification integrated into the builder abstraction as well, this notebook show this in one of the sections: https://github.com/Xilinx/finn/blob/dev/notebooks/advanced/4_advanced_builder_settings.ipynb