Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.48k stars 628 forks source link

Check failed: bo_ != XRT_NULL_BO allocation failure: xrt_.handle 0xaaaaceaedc80 xrt_.device_id 0 size 5914624 xrt_.flags 0x1000000 #1164

Closed ranirachel2021 closed 1 year ago

ranirachel2021 commented 1 year ago

Hi

I am facing a similar issue as mentioned in https://github.com/Xilinx/Vitis-AI/issues/552#issuecomment-1048409522 while executing custom trained yolov3 model using Vitis-AI-Library API in ZCU102 target.

**_root@xilinx-zcu102-2020_2:~/Vitis-AI/demo/Vitis-AI-Library/samples/yolov3# ./test_video_yolov3 yolov3zcu 0 -t 6

(test_video_yolov3:1310): CRITICAL : 13:24:22.707: gst_v4l2_object_destroy: assertion 'v4l2object != NULL' failed

(test_video_yolov3:1310): CRITICAL : 13:24:23.281: gst_v4l2_object_destroy: assertion 'v4l2object != NULL' failed WARNING: Logging before InitGoogleLogging() is written to STDERR F1213 13:24:27.990424 1310 buffer_object_xrtimp.cpp:74] Check failed: bo != XRT_NULLBO allocation failure: xrt.handle 0xaaaaceaedc80 xrt_.deviceid 0 size 5914624 xrt.flags 0x1000000 ** Check failure stack trace: Aborted_

Kindly check this.

I am sharing my compiled model for your reference. https://drive.google.com/file/d/1jldqsMwTKN-lIUFH5fb4iFKYDygwHAWH/view?usp=share_link

qianglin-xlnx commented 1 year ago

Hi @ranirachel2021 The error shows the cma memory for dpu is not enough. How many cma memory do you set in your system. Use the following command to check. cat /proc/meminfo

To solve your issue, you can set a larger cma in your system, such as 1GB or 1.5GB. Or you can reduce the thread number lower to reduce the memory requirements.

ranirachel2021 commented 1 year ago

Hi @qianglin-xlnx ,

As per your advice I have increased the cma size to 1.5GB, but still we are facing the same issue and the model execution gets stuck without showing any error.

Please find the below screenshot regarding increased CMA size. image

Regards, rachel

qianglin-xlnx commented 1 year ago

@ranirachel2021 Can you try the following command and show the log ./test_video_yolov3 yolov3zcu 0

ranirachel2021 commented 1 year ago

**Hi,

When we run the above command, the target and the console window gets stuck without showing any messages. Kindly go through the boot log:**

boot_log.txt

Regards,

rachel

ranirachel2021 commented 1 year ago

image

Regards, Rachel

qianglin-xlnx commented 1 year ago

@ranirachel2021 It seems the program is running. For video program, it will display the video. So you need to connect the zcu102 board with display via DP.

qianglin-xlnx commented 1 year ago

also, ensure the usb camera is properly connected.

ranirachel2021 commented 1 year ago

Hi @qianglin-xlnx ,

As per your instruction we have connected the target zcu102 via DP. Please find the below error message:

root@xilinx-zcu102-2020_2:~# [ 190.990960] [drm] bitstream 21f1600f-d270-47ff-a932-8b6469ce62ef locked, ref=7 [ 191.576209] Insufficient stack space to handle exception! [ 191.578854] Insufficient stack space to handle exception! [ 191.589938] Insufficient stack space to handle exception! [ 191.595007] Insufficient stack space to handle exception!

Regards, Rachel

qianglin-xlnx commented 1 year ago

Is there any output on the display? Why not try a basic example running dpu with video file. https://github.com/Xilinx/Vitis-AI/tree/1.3.1/demo/VART/adas_detection This is just to make sure the dpu and display work normally. Then, you can run your model with jpeg file input. Finally, run your model with video input. (You can try the video file as input first).

ranirachel2021 commented 1 year ago

Hi,

We are able to run built in applications like pose,face detection etc on the target successfully. Please find the screenshot for your reference. We are currently facing issue when we are trying to run a custom trained yolov3 model using vitis-ai-library API. IMG20230209170640

Regards, Rachel

qianglin-xlnx commented 1 year ago

The model's input is 608x608, which is different from yolov3_voc_tf (416x416) So we need the model's detail information, Could you share the yolov3zcu.prototxt you used?

ranirachel2021 commented 1 year ago

Hi @qianglin-xlnx ,

Please find below the yolov3zcu.prototxt file.

model { name: "yolov3zcu" kernel { name: "yolov3zcu" mean: 0 mean: 0 mean: 0 scale: 0.00390625 scale: 0.00390625 scale: 0.00390625 } model_type : YOLOv3 yolo_v3_param { num_classes: 1 anchorCnt: 3 conf_threshold: 0.001 nms_threshold: 0.45 layername:"yolov3/convolutional59/BiasAdd/aquant" layername:"yolov3/convolutional67/BiasAdd/aquant" layername:"yolov3/convolutional75/BiasAdd/aquant" biases: 10 biases: 13 biases: 16 biases: 30 biases: 33 biases: 23 biases: 30 biases: 61 biases: 62 biases: 45 biases: 59 biases: 119 biases: 116 biases: 90 biases: 156 biases: 198 biases: 373 biases: 326 test_mAP: false } is_tf : true }

Regards, Rachel

lishixlnx commented 1 year ago

can you please change below lines in prototxt file and try again?

layer_name:"yolov3/convolutional75/BiasAdd/aquant"
layer_name:"yolov3/convolutional59/BiasAdd/aquant"
layer_name:"yolov3/convolutional67/BiasAdd/aquant"
ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

I have updated the prototxt file but still the target and console gets stuck without showing any output or error message.

I have tried with input 416 x 416 as mentioned in below comment, then also the output remains the same.

The model's input is 608x608, which is different from yolov3_voc_tf (416x416) So we need the model's detail information, Could you share the yolov3zcu.prototxt you used?

Thanks & regards, Rachel

lishixlnx commented 1 year ago

please run basic test for this model:

test_jpeg_yolov3

to check if it runs well

ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

unfortunately, the same issue is there for test_jpeg_yolov3 model also. The target as well as the console remains stuck after execution of command without any error message. Built in models of Vitis-AI-Library API is working fine with output. Please find the below screenshot for your understanding. I have executed both built in yolov3_voc(produces the required output) as well as y3zcu416(custom model).

image

Regards, Rachel

lishixlnx commented 1 year ago

please test with your original model in the attachment.

ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

We are getting the above same output for the attached custom model also. Please below :

image

regards, Rachel

lishixlnx commented 1 year ago

I mean, test the attached customer model with simple test_jpeg_yolov3. that can help to locate if the error is related to model or the video/gst module part.

ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

Kindly understand that , we are able to produce required output with both test_jpeg_yolov3 and test_video_yolov3 command while using the built in models of yolov3 like yolov3_voc, yolov3_voc_tf etc.

We are encountering the issue(target and console remains stuck), while trying to execute customized yolov3 model which is custom trained, quantize and compiled based on Vitis-AI -tutorials using both test_jpeg_yolov3, test_video_yolov3 command.

Tutorial followed:

https://github.com/Xilinx/Vitis-Tutorials/tree/master/Machine_Learning/Design_Tutorials/07-yolov4-tutorial (tensorflow version with VAI 1.3)

Thanks and regards, Rachel

lishixlnx commented 1 year ago

please continue test your model in attachment with test_jpeg_yolov3. this time, please add below env parameter and check the result. env DEEPHI_PROFILING=1 DEBUG_YOLO=1 DEBUG_YOLO_LOAD=1 test_jpeg_yolov3 .... [other parameters]...

ranirachel2021 commented 1 year ago

Hi @lishixlnx .

I have tested my custom model with above command, but as in previous cases the output remains the same both the target and console remains unresponsive.

I have tested the command with a built in model yolov3_voc and I got a log with required output.

Please find the below link for my customized model: https://drive.google.com/drive/folders/1_gXB1Adpcl8Pf90en9De5W63wK5Nu8-Z?usp=share_link

lishixlnx commented 1 year ago

what is the output for your custom model with above command ? is there nothing output ?

ranirachel2021 commented 1 year ago

There is no output for the above command also

lishixlnx commented 1 year ago

I noticed 2 xmodel files in the url of https://drive.google.com/drive/folders/1_gXB1Adpcl8Pf90en9De5W63wK5Nu8-Z?usp=share_link which one is you prefered? the one with "_org" in name or the smaller one?

ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

Both the files are generated after compilation, I usually try with smaller one- yolov3zcu.xmodel as mentioned in the tutorials . It will be grateful if you could tell me what is the difference between both the xmodel files? and can we use both the files for execution?

lishixlnx commented 1 year ago
  1. I checked the model files. The smaller one is the compiled, and bigger one is the quantized model. so we should use the smaller one.
  2. I checked the code, the "nothing output" is strange. Can you use the lastest version to test?
ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

Kindly mention which latest version I should use to test? sorry, I am unable to understand your comment.

2. I checked the code, the "nothing output" is strange.   Can you use the lastest version to test?
lishixlnx commented 1 year ago

v3.0 is available. you can try it.

ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

Our target image is build with Vitis-AI v1.3 and the Vitis Tool version used is 2020.2, Is it possible to test model with v3.0 Vitis-AI-Library API?

Regards, Rachel

lishixlnx commented 1 year ago

please build everything with v3.0 tools

ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

I am able to execute my custom yolov3 model for images successfully with Vitis-AI v3.0. Thank you for your support.

I tried to run the model for video input , but I am encountering below mentioned error: image Kindly guide me.

lishixlnx commented 1 year ago
  1. check if the video exist.
  2. since you can run model with images correctly, that means the model is correct. So the issue should have no relationship with model. please check the video files/monitor etc.
ranirachel2021 commented 1 year ago

Hi @lishixlnx ,

  1. Is the above issue( running the model for video input) related to below given issue: https://support.xilinx.com/s/question/0D52E00006hpNO0SAM/zcu102-petalinux-could-not-initialize-supporting-library-omxh264decomxh264dec0?language=en_US

Kindly confirm.

  1. Even though I'm able to execute my model as mentioned in below comment, The result is not accurate(detection is not accurate). I have tested the trained weights before converting to target model in Google Colab and got the accurate result.

I am able to execute my custom yolov3 model for images successfully with Vitis-AI v3.0. Thank you for your support.

Kindly guide me whether this happens because of quantization process or do we need to use VAI optimizer also.

Please find the output in both cases for your understanding:

ZCU102_LP_output_variation.odt

Regards, Rachel

ranirachel2021 commented 1 year ago

Hi, Kindly guide me why we are not able to deploy the custom model in Vitis-AI v1.3?

We are having license for only Vitis unified tools till 2021.1, hence we are unable to build DPU integrated target images for our custom boards to use Vitis-AI v3.0.

Kindly guide me how we can migrate to higher versions or is there any patch to fix the v1.3 error?

Regards,

Rachel

lishixlnx commented 1 year ago

since you have said, you can run the built in model correctly with testjpeg and testvideo program, I don't think that's the same issue as "https://support.xilinx.com/s/question/0D52E00006hpNO0SAM/zcu102-petalinux-could-not-initialize-supporting-library-omxh264decomxh264dec0?language=en_US".

please do below test:

  1. test the built in model with testvideo* program;
  2. replace the model with your own and test again, please be sure use the same video file.

even your model's accuracy is low, it doesn't affect the video to run. If you can't run, please check above step to find out which is different.

qianglin-xlnx commented 1 year ago

Closing since no activity for more than 2 months, please reopen if you still have question, thanks