Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.47k stars 630 forks source link

Vitis AI 2.5 compile error when using tensorflow2 yolov4-tiny model #963

Closed vaan2010 closed 1 year ago

vaan2010 commented 2 years ago

Hi, I want to quantize and compile tensorflow2 yolov4-tiny model through Vitis AI 2.5, but i encountered some problem.

The structure of yolov4-tiny model is referrenced by CSPdarknet53 and trained from the following github:

[GitHub - bubbliiiing/yolov4-tiny-tf2](https://github.com/bubbliiiing/yolov4-tiny-tf2)

When I started to quantize the model, I found the tf.split is not supported by Vitis AI 2.5, so I modify it to Conv 1x1. This made me through the quantization step successfully.

However, the next step when I compiled the quantized model, it showed the following error: problem

The error message: [UNILOG][FATAL][XIR_MULTI_DEFINED_OP][Multiple definition of OP!] quant_max_pooling2d_1

which means the duplicate layer is defined, but I check the whole model not find any the same two layer: problem2

The above image is about all maxpooling layer i have.

Does anyone know how to solve this problem or could give me some suggestion to handle this?

Thanks,

Norris Lin

HuiJu1218 commented 2 years ago

Hi ! I have same issue. My environment is vitis-AI 1.4 - tensorflow2. And I also use yolov4-tniy which gen from keras. image Does anyone know how to solve this problem? Thanks, Ru

vaan2010 commented 2 years ago

Hi Ru, I try to quantize and compile through Vitis AI 1.4.1.978 docker image with Vitis AI github v1.4.1, and it generate the xmodel successfully! But when I deploy my xmodel to KV260, it still can't run through kv260-smartcam, I try another model which is already run by smartcam successfully to replace my xmodel and has no segmentation fault, so I think this is a problem about xmodel convertion.

Although you quantize and compile your tensorflow2 model successfully, the model doesn't work properly everytime.

I still confuse about this.

HuiJu1218 commented 2 years ago

Hi Norris, I have try keras2.3 and combine model(model.json+weights.h5) to h5 and use vitis-AI container (tensorflow2) to convert model. This flow works for me, but when I deploy on kv260. Always shows "segmentation fault". When I trace yolov4-tniy output, there were expected two outputs but only one output was found. I have the same thoughts as you about the error message.

Ru

vaan2010 commented 2 years ago

Hi Ru, I check the outputs of converted xmodel, and the following image is the result. image You can see there are two output in this structure: quant_conv2d_20fix and quant_conv2d_23fix

But I still not yet find any solution for segmentation fault on kv260.

Norris

sh39o commented 2 years ago

Hi @vaan2010 , could you please provide tf2_yolov4-tniy_my_mode_org.xmodel ?

vaan2010 commented 2 years ago

Hi sh39o, The attachment is my converted xmodel. yolov4-tiny.xmodel

Norris

huisunCompiler commented 2 years ago

Hi @vaan2010 , sorry, could you please provide the xmodel before compiling? It seems that the given xmodel is after compiled, and the leaky relu-fix is deployed on CPU.

From your xmodel given, the weights data is missed. The runtime can't acquire the correct weights data, which causes the segment fault.

vaan2010 commented 2 years ago

Hi, The attachment is my model before compiling. quantized_model.h

And the following file is the original model weights and structure before quantizing. original_yolov4_tiny.h5

Norris

vaan2010 commented 2 years ago

Hi, I try some ways to deploy Yolov4-tiny on KV260 these days by following other's github tutorial. A common way that usually be used is when you train Yolov4-tiny model with tensorflow2 from this github link: https://github.com/bubbliiiing/yolov4-tiny-tf2

You need to convert your model to .pb file, as tensorflow1 model type. Because you can't convert Yolov4-tiny tensorflow2 model to xmodel through Vitis-AI, it always show the following error: image [UNILOG][FATAL][XIR_MULTI_DEFINED_OP][Multiple definition of OP!] quant_max_pooling2d_1

After you get .pb file successfully, you will find another problem when you follow this github at quantization by using Vitis-AI tensorflow1 conda environment. You will get the following error message: image

This means the converted tensorflow1 model mismatch tensorflow2, even though you use vitis-ai-tensorflow to quantize your tensorflow1 Yolov4-tiny model.

I try the float mdoel from this github also got the same error above.

In conclusion, I stuck at here and have no idea to deal with this.

Wish someone could give me some suggestion or some idea to realize and solve the whole problem.

Thanks, Norris

huisunCompiler commented 2 years ago

OK, I have reproduced the problem "multiple definition of op: quant_max_pooling2d_1". It seems that there is a bug in optimizing graph, and I am trying to fix it. I'll tell you if I have progress.

huisunCompiler commented 2 years ago

Hi @vaan2010 , This is an op naming problem. There are 3 quant_max_pooling2d, (_1), (_2).

The first pooling layer quant_max_pooling2d is optimized in the graph optimization procedure, and the optimization add a postfix "_1", which is collapsed with the quant_max_pooling2d_1 defined in the original graph. Thank you for your bug report and immediate response. We will fix this problem in the next version. To surpass this problem, you can now rename the quant_max_pooling2d layer into other names, e.g., quant_max_pooling2d_0 either in the h5 file or in the xmodel file directly Here is the python script for renaming the quant_max_pooling2d in the quantzied_model.xmodel

import xir
g = xir.Graph.deserialize("quantize_model.xmodel")
ops = g.toposort()
for op in ops:
    if op.get_name() == "quant_max_pooling2d":
        replace_pool = g.create_op("quant_max_pooling2d_0", op.get_type(), op.get_attrs(), op.get_input_ops())
        [succ.replace_input_ops(op, replace_pool) for succ in op.get_fanout_ops()]
        g.remove_op(op)
g.serialize("renamed_quantize_model.xmodel")

The xmodel after renaming and compiling are listed below https://pan.baidu.com/s/1AAmD1Ifev5e6l8dvApqCvQ?pwd=revu

vaan2010 commented 2 years ago

Hi @huisunCompiler , I will try the way you support for this problem immediately, and check if it can work on KV260. If the progress of deployment is successfull, I will share it to this issue for anyone who also encounters.

Thank you again for the solution. Norris

vaan2010 commented 2 years ago

Hi everyone, I deploy my Yolov4-tiny model on KV260 successfully! Well, it looks like works now, but I haven't test the model on kv260 in detail, so you can reference the folloing flow if you encounter the problem such as this opened issue.

My environment: Tensorflow-gpu 2.6 Vitis AI 1.4.1 kv260 petalinux 2021.1

step 1. I trained my own Yolov4-tiny model through this github. step 2. Take your trained model (.h5 file) and use Vitis AI 1.4.1 with tensorflow2 to quantize. You can use example shell by downloading Vitis AI Lab to quantize. step 3. Using example in Vitis AI Lab to compile. In this step, you will find the problem as mention above. Don't worry, check the dump model location and copy it to your folder. image step 4. We will use this dump model to compile again, but we need to change the layer name first through the following code.

import xir
g = xir.Graph.deserialize("quantize_model.xmodel")
ops = g.toposort()
for op in ops:
    if op.get_name() == "quant_max_pooling2d":
        replace_pool = g.create_op("quant_max_pooling2d_0", op.get_type(), op.get_attrs(), op.get_input_ops())
        [succ.replace_input_ops(op, replace_pool) for succ in op.get_fanout_ops()]
        g.remove_op(op)
g.serialize("renamed_quantize_model.xmodel")

step 5. Change your Vitis AI conda environment to pytorch, and use vai_c_xir to compile your new modified xmodel. notice: You must check the Vitis AI version will match the meta-vitis-layer on kv260 or you will get the folloing error on kv260. Generally, petalinux 2021.1 has meta-vitis-ai 1.4

terminate called after throwing an instance of 'std::bad_any_cast'
 what(): bad any_cast
Aborted

step 6. Copy your xmodel to kv260, and prepare the files like preprocess.json, drawdeault.json, label.json and so on.

step 7. Load kv260-smartcam app and type the command

sudo smartcam --mipi -W 1920 -H 1080 --target dp -a <your task name>

That's all! Now you can run your custom model and task on kv260!

If the content has any mistake or problem, please tell me and give me some suggestions, I will be very gratefull! Thanks again for people help me to solve!

Norris

vaan2010 commented 2 years ago

Hi @huisunCompiler, Well, I encounter another problem currently. I quantize and compile the same yolov4-tiny model again, and I found that I don't need change layer name anymore. That's mean the error of the message: [UNILOG][FATAL][XIR_MULTI_DEFINED_OP][Multiple definition of OP!] quant_max_pooling2d_1 has been fixed by Xilinx already? But when I depolyed my yolov4-tiny model on kv260 after finishing the quantization and compilation, I found the error message: Segmentation fault always shows up. That's mean I can quantize and compile yolov4-tiny model successfully and don't need to change the name of maxpool layer, but I can't deploy it on kv260!

The following is my workflow: environment: Vitis AI 1.4 Vitis AI Docker 1.4.916 Petalinux 2021.1

  1. prepare the original float model
  2. quantize the model by the following code
    
    from dataset import input_fn, NUM_IMAGES
    from dataset import get_images_infor_from_file, ImagenetSequence
    from nets.yolo import yolo_body
    from utils.utils import get_classes
    from tensorflow_model_optimization.quantization.keras import vitis_quantize

input_shape = [416, 416] anchors_mask = [[3, 4, 5], [1, 2, 3]] phi = 0 classes_path = 'model_data/voc_classes.txt' weight_decay = 0 model_path = './test.h5' TF2_NETWORK_PATH = '../../../'

img_paths, labels = get_images_infor_from_file(TF2_NETWORK_PATH+'images/', TF2_NETWORK_PATH+'val.txt', 1) imagenet_seq = ImagenetSequence(img_paths[0:1000], labels[0:1000], 50)

class_names, num_classes = get_classes(classes_path) model_body = yolo_body((input_shape[0], input_shape[1], 3), anchors_mask, num_classes, phi = phi, weight_decay = weight_decay) model_body.load_weights(model_path)

model = vitis_quantize.VitisQuantizer(model_body).quantize_model(calib_dataset=imagenet_seq)

save quantized model

model.save('./quantized.h5')


This is the result after quantizing. [quantized model](https://anstekadmin-my.sharepoint.com/:u:/g/personal/norris_lin_anstek_com_tw/EQEIUmEp6aVEvih6nfS1YNwB4UyRPRFscz2SGjO17AxQvw?e=G8Nuoz)
3. compile the model [compiled model](https://anstekadmin-my.sharepoint.com/:u:/g/personal/norris_lin_anstek_com_tw/EdWqmLK8goxLhbuN_nSYvBkBggtL2YxmNbNXSRSDcQCECw?e=aH8UID)
4. Deploy and occur the error message

![image](https://user-images.githubusercontent.com/38204276/187333673-5b5241a7-4c73-4ae5-8430-0dd32586bf17.png)

Looking forward your reply and suggesstions.
Thanks!

Best Regards,
Norris
HuiJu1218 commented 2 years ago

Hi @vaan2010, First, thaks a lot for sharing vitis-AI 2.5 convert model SOP. It really helpful. I had use vitis-AI 1.4 to convert model before I use vitis-AI 2.5 and had same issue like you. So, We update to 2022.1 and vitis-AI2.5 to run inference.

BR, Ru

vaan2010 commented 2 years ago

Hi @HuiJu1218, You're welcome! But now I want to use smartcam app on kv260 to run my yolov4-tiny model. First I found the difference between the previous compiled model and the model I converted now.

image

The left is the previous compiled model and the right is the model I compiled now. You can see the difference is the pooling layer is on the different location, and the left can be deployed on kv260 successfully, but the right should be the correct one because it has the same architecture as the original float yolov4-tiny model and it can be loaded on kv260.

So the problem is, when I convert my yolov4-tiny model today, it is compiled to the right one, but it is compiled to the left one previously, why the same model have two different type after compiling and one can be loaded successully but another can't?

Will the error of segmentation fault caused by the weights data is missed? But the whole way of quantization and compilation has no error messages today.

Wish you or someone else could give me some suggesstions or solution to handle this problem about segmentation fault on kv260.

Best Regards, Norris

vaan2010 commented 2 years ago

Hi everyone, I found some problems about the above question, so I open a extra issue. You can check the new issue and the problems I found from this site.

BR, Norris

qianglin-xlnx commented 1 year ago

refer to https://github.com/Xilinx/Vitis-AI/issues/997

dungng27 commented 1 year ago

Hi @vaan2010 I wanna ask how do you check meta-vitis-layer version on kv260?

jimmy-adams commented 1 year ago

Hi @vaan2010, First, thaks a lot for sharing vitis-AI 2.5 convert model SOP. It really helpful. I had use vitis-AI 1.4 to convert model before I use vitis-AI 2.5 and had same issue like you. So, We update to 2022.1 and vitis-AI2.5 to run inference.

BR, Ru

Hi @HuiJu1218 , Can i ask you the later status when compiling and deploying the tiny yolo on KV260?