Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.47k stars 630 forks source link

[Urgent]: Getting Segmentation fault (core dumped) on Alveo U200 #282

Open PavanSai-39 opened 3 years ago

PavanSai-39 commented 3 years ago

Hello Team,

I am currently facing the Segmentation Issue while running the Python (Multi-Threaded) code for the resnet50
pre-trained model taken from the Vitis-AI Model-Zoo.

My Current Environment: OS - Ubuntu 18.04.3 Kernel - 4.15.0 XRT Version - 2.8.743 Vitis-AI Docker - 1.3.411

1) So, when I ran the resnet50(.so file) [by following the instructions mentioned in the Vitis-AI documentation under the Running Examples Section] everything went well and the results I got are as follows:

(vitis-ai-caffe) Vitis-AI /workspace/demo/VART/resnet50 > ./resnet50 ../../../caffe-resnet50-pretrained-model/resnet50/resnet50.xmodel WARNING: Logging before InitGoogleLogging() is written to STDERR I0129 08:22:01.717298 84 main.cc:285] create running for subgraph: subgraph_conv1 loading xclbin: /opt/xilinx/overlaybins/dpdpuv3_wrapper.hw.xilinx_u200_xdma_201830_2.xclbin Device acquired (New CU) errCode: errCode: 0 errCode String: SUCCESS myHandle: 2 valid: 1

done loading xclbin

WARNING: Running in non-performance mode.

Image : cars_72.jpg top[0] prob = 0.731058 name = jeep, landrover top[1] prob = 0.268941 name = pickup, pickup truck top[2] prob = 0.000001 name = car wheel top[3] prob = 0.000000 name = tow truck, tow car, wrecker top[4] prob = 0.000000 name = snowplow, snowplough

2) But when I ran the resnet50_mt_py (multi-threaded) script, I am getting the Segmentation Error as follows:

(vitis-ai-tensorflow) Vitis-AI /workspace/demo/VART/resnet50_mt_py > python3 resnet50_jan29_1.py 1 ../../../tenorflow-resnet50-mt/resnet_v1_50_tf/resnet_v1_50_tf.xmodel loading xclbin: /opt/xilinx/overlaybins/dpdpuv3_wrapper.hw.xilinx_u200_xdma_201830_2.xclbin Device acquired (New CU) errCode: Fatal Python error: Segmentation fault

Thread 0x00007f0907fff700 (most recent call first): File "resnet50_jan29_1.py", line 140 in runResnet50 File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/threading.py", line 864 in run File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/threading.py", line 916 in _bootstrap_inner File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007f098aab2740 (most recent call first): File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/threading.py", line 1072 in _wait_for_tstate_lock File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/threading.py", line 1056 in join File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/threading.py", line 1294 in _shutdown Segmentation fault (core dumped) (vitis-ai-tensorflow) Vitis-AI /workspace/demo/VART/resnet50_mt_py >

When I searched online it seems that it is a Deadlock problem.

Could you please help me in solving this Issue.

VishalX commented 3 years ago

@PavanSai-39

Can you please provide the steps to reproduce the issue?

waldenou commented 3 years ago

Hi @VishalX,

I have encountered the same issue when following this example. It seems that the seg fault error is thrown when runner.wait(job_id) is called.

Also when running XIR demo here, the message 'WARNING: Running in non-performance mode.' is printed out. I wonder what this means.

Thanks.

VishalX commented 3 years ago

@waldenou

What is your target DPU? Is it DPUCADF8H for Alveo-u200/u250? If yes, you can try AKS for the same. Check out this: https://github.com/Xilinx/Vitis-AI/tree/master/tools/AKS#run-examples-on-alveo-u200alveo-u250-with-new-batch-dpu

VishalX commented 3 years ago

@PavanSai-39 Is this copy of #271 ?

If yes, pls check the answer https://github.com/Xilinx/Vitis-AI/issues/271#issuecomment-771477174 You can also check AKS (see my previous comment) to run the multi-threaded example.