ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.17k stars 310 forks source link

ArmNN v21.05 segfaults on firefly and rpi4 #566

Closed psyhtest closed 2 years ago

psyhtest commented 3 years ago
$ ck benchmark program:image-classification-armnn-tflite --env.USE_NEON \
--speed --repetitions=1 --skip_print_timers --skip_stat_analysis \
--dep_add_tags.images=preprocessed,using-opencv,side.224 \
--dep_add_tags.library=tflite,neon \
--dep_add_tags.weights=resnet50 \
--env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=10
...
Graph file: /home/maria/CK-TOOLS/model-tflite-mlperf-resnet-no-argmax-downloaded/resnet50_v1.no-argmax.tflite
Data layout: NHWC
Image dir: /datasets/dataset-imagenet-preprocessed-using-opencv-crop.875-full-inter.linear-side.224
Image list: image_list.txt
Image size: 224
Image channels: 3
Prediction classes: 1000
Result dir: predictions
Batch count: 10
Batch size: 1
Normalize: 0
Subtract mean: 1
Per-channel means to subtract: 123.68, 116.78, 103.94
Image count in file: 10

Loading graph...
./tmp-y1z2j8rw.sh: line 31: 11771 Segmentation fault      (core dumped) ./classification

(The same with --env.USE_OPENCL.)

I've traced this as far as armnnTfLiteParser::TfLiteParserImpl::ParseAdd:

Loading graph...
86              measure_setup([&]{
(gdb) 

Thread 1 "classification" received signal SIGSEGV, Segmentation fault.
0x0000007fb6a8c368 in armnnTfLiteParser::TfLiteParserImpl::ParseAdd(unsigned long, unsigned long) ()
   from /home/maria/CK-TOOLS/lib-armnn-gcc-7.5.0-neon-opencl-rel.21.05-tflite-linux-64/install/lib/libarmnnTfLiteParser.so.24
(gdb) 

To reproduce, follow this Jupyter notebook to Quick Test. For debugging, you may wish to use two steps:

$ ck compile program:image-classification-armnn-tflite --env.USE_NEON --speed
$ ck run program:image-classification-armnn-tflite --env.USE_NEON \
--repetitions=1 --skip_print_timers --skip_stat_analysis \
--dep_add_tags.images=preprocessed,using-opencv,side.224 \
--dep_add_tags.library=tflite,neon \
--dep_add_tags.weights=resnet50 \
--env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=10

Also, you may find useful descending into the program's tmp directory:

$ cd $(ck find program:image-classification-armnn-tflite)/tmp

and entering a virtual env with all env vars defined:

$ ck virtual env --tag_groups='armnn,tflite,neon,rel.21.05 preprocessed,side.224 weights,tflite,resnet xopenme'
*** Warning: you are in a new shell with a pre-set CK environment. Enter "exit" to return to the original one!
psyhtest commented 3 years ago

By the way, not debugged on rpi4 but the symptom is the same. The same code worked fine with v21.02 on firefly, rpi4 and xavier for MLPerf Inference v1.0. Now with v21.05 only on xavier.

Colm-in-Arm commented 3 years ago

Hi @psyhtest,

Lots to digest here.....

The fact this is occurring on multiple platforms and you've narrowed it down to model load suggests this should be relatively easy to recreate.

I first tried it on x86 directly using ExecuteNetwork and it loads and executes resnet50_v1.no-argmax.tflite without error.

I'm going to concentrate on the Pi 4 as it's easy for me to access. I don't have Ubuntu 20.04 but I'll try Ubuntu 20.10 with ArmNN 21.05. I used pyarmnn to directly load the tflite file and again it loaded without error.

`Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import pyarmnn as ann Your ArmNN library instance does not support Onnx models parser functionality. Skipped IOnnxParser import. parser = ann.ITfLiteParser() network = parser.CreateNetworkFromBinaryFile('/home/pi/CK-TOOLS/model-tflite-mlperf-resnet-no-argmax-downloaded/resnet50_v1.no-argmax.tflite') input_binding_info = parser.GetNetworkInputBindingInfo(0, 'input_tensor') options = ann.CreationOptions() runtime = ann.IRuntime(options) preferredBackends = [ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')] opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions()) netid, = runtime.LoadNetwork(opt_network)

` Does the CK tool modify the tflite model in any way before passing it to Arm NN?

Colm.

MikeJKelly commented 3 years ago

Hi @psyhtest

what version of TFLite are you using?

Best regards, Mike

psyhtest commented 3 years ago

I'll comment in more details later. Just to say that under a different user account v21.05 doesn't segfault. I first thought it was due to a protobuf version mismatch but then read somewhere here it is actually not used for the tflite parser?

Colm-in-Arm commented 3 years ago

No protobuf is not used for the tflite parser. It's used for the Onnx parser and some tests.

james-conroy-arm commented 2 years ago

I think this is no longer an issue @psyhtest ? Please let us know if you require any more help.

Cheers, James

psyhtest commented 2 years ago

Hi @james-conroy-arm, Unfortunately, it was still an issue last time I checked. Let me try with v21.11 on a fresh rpi4 system in a couple of weeks.

psyhtest commented 2 years ago

It appears that the problem was with the FlatBuffers library. In our CK-ArmNN / ArmNN-MLPerf workflows, we used to use its master.

That was fine until Google introduced some incompatible changes in v2 in May 2021, which coincided with us noticing issues with ArmNN v21.05. Where we observed no issues, FlatBuffers v1 had already been built.

james-conroy-arm commented 2 years ago

Ok, thanks for letting us know.

Cheers, James

psyhtest commented 2 years ago

Given that these workflows were created for the Arm MLG and are used by Linaro, it would be good to have them tested as part of your regressions. For example, to build the latest code:

$ ck install package --tags=armnn,tflite,neon,dev

Many thanks.