Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.44k stars 624 forks source link

Yolov5 Quantization Aware Training Error in Vitis AI #1349

Open IkrameBeggar opened 11 months ago

IkrameBeggar commented 11 months ago

I am trying to quantize a custom yolov5 model using quantization aware training method. I have applied all the model adjustments as provided in the vitis ai user guide v3.0 and wrote the QAT code based on the resnet example provided in vitis ai examples: https://github.com/Xilinx/Vitis-AI/blob/v3.0/src/vai_quantizer/vai_q_pytorch/example/resnet18_qat.py However, I am getting an error with QatProcessor function from pytorch_nndct library. The error is illustrated in the figure below: Traceback (most recent call last): File "yolov5_QAT.py", line 484, in main() File "yolov5_QAT.py", line 450, in main qat_processor = QatProcessor(model, inputs) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/quant_aware_training.py", line 190, in init if self._graph.node(node).has_bound_params(): AttributeError: 'NoneType' object has no attribute 'has_bound_params'

Screenshot from 2023-10-10 13-16-19

adamDEBBA commented 11 months ago

The error you're encountering, 'NoneType' object has no attribute 'has_bound_params', typically indicates that the node object in your quantization-aware training (QAT) code is None, and you're trying to access the has_bound_params() method on it. This error often occurs when there's an issue with your model or the way it's being processed during quantization-aware training.

To debug and resolve this issue, you can try the following steps:

Check Model Compatibility: Ensure that your custom YOLOv5 model is compatible with the QAT process. Not all models can be quantized without modifications. Check if there are any layers or operations in your custom model that are not supported by the quantization framework you are using.

Input Data: Verify that the inputs you're passing to the QatProcessor function are correctly formatted and are compatible with your model's input requirements.

Layer Names: Make sure the layer names or node names in your custom model match what the QAT code expects. The error might occur if a required layer is missing or named differently.

Check Dependencies: Ensure that you have installed the required dependencies for your QAT code and that they are compatible with your PyTorch version. Also, verify that you are using a compatible version of Vitis AI and the pytorch_nndct library.

Model Debugging: You can try to print the model's architecture and investigate if there are any irregularities or issues with layer names, shapes, or attributes. This can help you pinpoint where the NoneType error is originating.

Update Library: It's possible that there might be a bug in the version of pytorch_nndct you are using. Check for updates or try using a different version of the library to see if the issue is resolved.

Consult Documentation: Review the documentation for the specific version of Vitis AI you are using. There may be updates, changes, or specific requirements for the QAT process that are not covered in the example you linked.

Debugging Tools: You can use debugging tools such as print statements and Python's debugger (pdb) to trace the flow of your code and identify where the None value is coming from.

Community/Support: If you are still unable to resolve the issue, consider reaching out to the Vitis AI community or support channels. They may be able to provide specific guidance or solutions based on the version you are using and any known issues.

cchalou98 commented 11 months ago

The error you're encountering seems to be related to the QatProcessor trying to access the has_bound_params() method on a NoneType object. This typically indicates that the _graph attribute of the QatProcessor is not properly initialized or set to None. Without seeing your specific code, it's a bit challenging to pinpoint the exact issue, but I can provide some general guidance on how to troubleshoot and resolve this type of error. Here are some steps you can take:

Check Model and Inputs: Ensure that your model and input tensors are properly defined and constructed. The model should be a valid PyTorch model, and the input tensors should have the correct shape and data type.

Check PyTorch Version: Make sure that you are using a compatible version of PyTorch with the version of the pytorch_nndct library. Incompatibility between versions could lead to issues.

Check QatProcessor Initialization: Verify that the QatProcessor is being initialized correctly. Ensure that you are passing the correct arguments to the constructor, including a valid PyTorch model and input tensors.

Review Custom Model Adjustments: Double-check the adjustments you made to your custom YOLOv5 model. Ensure that any modifications you made for quantization are valid and do not introduce errors.

Look for Example Code: Review the example code you referred to in the Vitis-AI user guide and the ResNet example from the Vitis-AI examples repository. Ensure that you are following the structure and procedures outlined in these examples.

Check for Updates: Check for any updates to the Vitis-AI library or the pytorch_nndct library. It's possible that there have been updates or bug fixes that address the issue you're facing.

Debugging: Insert print statements or use a debugger to inspect the state of variables and objects at different points in your code. This can help you identify where the NoneType error is originating.

Consult Documentation and Forums: Check the official documentation for Vitis-AI and the pytorch_nndct library. Additionally, consider looking for forums or community discussions related to Vitis-AI where others may have encountered similar issues.

If the issue persists and you are unable to identify the root cause, consider reaching out to the support channels provided by Xilinx or the community forums for assistance. They may be able to provide more specific guidance based on the details of your implementation.

zijian98 commented 10 months ago

Hi @IkrameBeggar , I am encountering the same issue as you, by any chance did you managed to solve this problem?

Thanks!

IkrameBeggar commented 10 months ago

Hi @IkrameBeggar , I am encountering the same issue as you, by any chance did you managed to solve this problem?

Thanks!

Hello, I didn't manage to solve it. Yet, I figured out that the problem with the model architecture. I added a print statement to check the value of "node" in the quant_aware_training.py code to compare it with the one of the resnet18 model provided as an example in the Vitis AI Github Repository. Screenshot from 2023-10-18 09-59-36 Screenshot from 2023-10-18 09-49-47 Screenshot from 2023-10-18 10-03-43 The node value for yolov5 contains a single layer of the model. However, for resnet18, the node value is updated by the next layer after each iteration of the for loop. I hope this will be helpful for you to solve the error. In case you did, please let me know how did you manage to solve it.