ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.17k stars 309 forks source link

ArmNN delegate causes segmentation fault when trying to run Yolov5s #700

Closed vn218 closed 1 year ago

vn218 commented 1 year ago

I am trying to run ultralytics' yolov5s model on my odroid xu4 board. I exported the model as a .tflite model using the export.py script provided in the repo. Then I modified models/common.py file in order for detect.py to use the ArmNN delegate. However, for all combinations of backends, it causes a segfault immediately after the created tflite delegate message (doesn't cause any problem when the delegate is only loaded but not used). I am using tflite-runtime 2.5 and Python 3.7.14. My machine is running Ubuntu Mate 22.04. I built ArmNN by following the docker build tool guide. Also, trying to use ArmNN delegate seems to break tflite-runtime. Afterwards, it always returns segfault even when I am not using the delegate. It works properly after I reinstall it.

keidav01 commented 1 year ago

Hi @vn218,

Unfortunately, ArmNN does not currently support YoloV5 and we do not have a version of it within our Model Zoo to test.

I wanted to test the external model that you have linked in order to reproduce your issue. But, as I understand it, licensing prevents us from downloading this model due to their use of "GNU GENERAL PUBLIC LICENSE". I have contacted our Third Party IP team to understand this for certain. I will let you know what the outcome is.

Your versions all look to be correct and compatible with ArmNN.

For now, in order to aid us with your issue, can you please give some clear and concise details about the commands you are using and please attach the output as it will be easier to visualize.

Also, please ensure all unit tests are passing for ArmNN and the Delegate. If not, please attach.

Thank you, Keith

vn218 commented 1 year ago

@keidav01 Steps .........

On Colab : Clone yolov5

!git clone https://github.com/ultralytics/yolov5 
%cd yolov5

Export as .tflite !python export.py --weights <model_path> --include tflite

Then I downloaded the model on the board Steps on the board : Clone yolov5

git clone https://github.com/ultralytics/yolov5 
cd yolov5

Edit line 461 in models/common.py to make use of armnn delegate

Run Inference python3 detect.py --weights <model path> --source <.mp4 path>

I immediately get a segmentation fault.

All the Delegate Unit Tests are passing, however a few ArmNN Unit Tests are failing. I had raised another issue for the same

Update : I tried using the tflite gpu delegate and got the same result, so I guess its not armnn specific. Also, tried for a few models from the tflite examples and it failed for them as well, so its also not yolov5s specific. I tested those models by running label_image.py using tflite-runtime. It works fine without any delegate. However, when I use a delegate, tflite_runtime.interpreter.Interpreter() causes a segfault (tflite_runtime.interpreter.load_delegate() doesn't cause any problem). The output.....

(env) odroid@odroid:~/Desktop/project/tflite_sample$ python3 label_image.py   --model_file lite-model_movenet_multipose_lightning_tflite_float16_1.tflite   --label_file labels.txt   --image grace_hopper.bmp -e ../aarch32_build/delegate/libarmnnDelegate.so -o "backends:GpuAcc,CpuAcc,CpuRef;logging-severity:info" 
Loading external delegate from ../aarch32_build/delegate/libarmnnDelegate.so with args: {'backends': 'GpuAcc,CpuAcc,CpuRef', 'logging-severity': 'info'}
Info: ArmNN v30.0.0
Info: Initialization time: 11.71 ms.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
Segmentation fault
vn218 commented 1 year ago

@keidav01 The issue only occurs for fp16 models. The segfault occurs while the model gets dequantized to fp32.

FrancisMurtagh-arm commented 1 year ago

Hi @vn218,

Are you still having this issue?

I'll try to reproduce the issue now as the model was approved with regard to 3rd party IP.

Thanks, Francis.

vn218 commented 1 year ago

@FrancisMurtagh-arm Using fp16 tflite model was causing the problem. It ran fine when I used fp32 tflite model with the reduce to fp16 option of the tflite delegate.

FrancisMurtagh-arm commented 1 year ago

Hi @vn218,

There was indeed a bug in fp16, I've pushed a fix if you could try it out:

https://review.mlplatform.org/c/ml/armnn/+/8960

Thanks, Francis.

FrancisMurtagh-arm commented 1 year ago

Closing for now, but please reopen if the patch doesn't fix your issue.

Thanks for reporting, Francis.