ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

Unable to execute MobilenetV2 on Mali-G710 GPU. #1108

Closed somasundaram1702 closed 1 week ago

somasundaram1702 commented 1 month ago

arm_compute_version=v0.0-unreleased Build options: {'toolchain_prefix': '', 'compiler_prefix': '', 'build': 'native', 'arch': 'arm64-v8a', 'neon': '1', 'opencl': '1', 'experimental_dynamic_fusion': '0', 'Werror': '0', 'embed_ke rnels': '1', 'examples': '0', 'validation_tests': '0', 'benchmark_tests': '0', 'benchmark_examples': '0', 'compiler_cache': 'ccache', 'build_dir': 'aarch64', 'extra_cxx_flags': '-fPIC '} Git hash=b'8baca90f290b6b9f621749aa5778f 40014cd309b'

Platform: Custom designed SoC with Mali G710 Gpu integrated. Operating System: Linux

Commands executed: ./ExecuteNetwork -c GpuAcc -f armnn-binary -m /mnt/drop box/MobileNetV2/MobileNet.armnn

I took the pre-built binaries for aarch64 from this link: https://github.com/ARM-software/armnn/tree/v23.02?tab=readme-ov-file I have also attached the Mobilnet model used in my run. MobileNet.zip

Problem description: Hi, I am trying to execute the mobilenet model in my Hybrid environment. I am using A53 cpu integrated with G710 Gpu. I get the below error while using ExecuteNetwork binary.

0:02:00: Warning: DEPRECATED: The program option 'model-format' is deprecated and will be removed soon. The model-format is now automatically set. 0:02:00: Warning: No input files provided, input tensors will be filled with 0s. 0:02:00: Info: ArmNN v33.0.0 0:02:01: [ 89.391250] mali 70000000.gpu: Loading Mali firmware 0x1010000 0:02:01: [ 89.412008] mali 70000000.gpu: Mali firmware git_sha: ba6471e0f3fa3a974709abd2628da574543b3c1d 0:02:09: Info: Initialization time: 495.02 ms. 0:02:12: Info: Optimization time: 580.56 ms 0:02:12: 0:02:13: [ 91.178812] random: crng init done 0:05:41: Warning: The input data was generated, note that the output will not be useful 0:05:41: ===== Network Info ===== 0:05:41: Inputs in order: 0:05:41: InputLayer, [1,3,224,224], Float32 0:05:41: Outputs in order: 0:05:41: OutPutLayer, [1,1000], Float32 0:05:41: 0:06:58: [ 230.783654] mali 70000000.gpu: AS_ACTIVE bit stuck for as 1. Might be caused by unstable GPU clk/pwr or faulty system 0:06:58: [ 230.783768] mali 70000000.gpu: Preparing to soft-reset GPU 0:06:58: [ 230.783874] mali 70000000.gpu: Wait for AS_ACTIVE bit failed for as 1, before sending MMU command 4 0:06:58: [ 230.783980] mali 70000000.gpu: Flush for GPU page table update did not complete 0:06:58: [ 230.785483] mali 70000000.gpu: Unhandled Page fault in AS1 at VA 0x00007FDFFA80A980 0:06:58: [ 230.785483] Reason: Memory is not growable 0:06:58: [ 230.785483] raw fault status: 0x230002C3 0:06:58: [ 230.785483] exception type 0xC3: TRANSLATION_FAULT at level 3 0:06:58: [ 230.785483] access type 0x2: READ 0:06:58: [ 230.785483] source id 0x2300 0:06:58: [ 230.785483] pid: 1291 0:06:58: [ 230.785704] mali 70000000.gpu: Failed to lock AS 1 for ctx 1291_0 0:08:15: [ 231.310622] mali 70000000.gpu: Stuck waiting on CLEAN_CACHES_COMPLETED bit, might be due to unstable GPU clk/pwr or possible faulty FPGA connector 0:08:15: [ 231.310759] mali 70000000.gpu: Failed to flush GPU cache when disabling AS 1 for ctx 1291_0 0:08:15: [ 231.317143] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.320093] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.324444] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.328293] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.331804] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.335644] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.339814] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.343293] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.347092] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.351120] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.354593] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.358192] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.361243] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.362342] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.366294] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.369592] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.373812] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.377661] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.380943] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.384593] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.388692] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.391753] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.395848] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.400042] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.402593] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.406943] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.410293] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.413870] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.417643] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.421817] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.425258] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.429093] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.433093] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.435662] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.437093] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.440658] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.444593] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.448443] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.451804] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.455856] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.459693] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.463104] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.466393] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.470641] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.473693] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.477768] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.481947] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.484896] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.489093] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:15: [ 231.493170] mali 70000000.gpu: Flush for GPU page table update did not complete 0:08:51: [ 263.158760] mali 70000000.gpu: [74390095] Suspend request sent on CSG slots 0x1 timed out for slots 0x1 0:08:51: [ 263.158874] mali 70000000.gpu: Timeout waiting for CSG slots to suspend before reset, slot_mask: 0x01 0:08:51: [ 263.702757] mali 70000000.gpu: Cache clean timed out. Might be caused by unstable GPU clk/pwr or faulty system 0:08:51: [ 263.702868] mali 70000000.gpu: [74479056] Timeout waiting for CACHE_CLN_INV_L2_LSC 0:08:51: [ 263.703011] mali 70000000.gpu: Quit idle for failing to prevent gpu reset. 0:10:07: [ 264.227966] mali 70000000.gpu: AS_ACTIVE bit stuck for as 0. Might be caused by unstable GPU clk/pwr or faulty system 0:10:07: [ 264.228081] mali 70000000.gpu: Flush for GPU page table update did not complete 0:10:07: [ 264.229444] mali 70000000.gpu: Flush for GPU page table update did not complete 0:10:07: [ 264.229554] mali 70000000.gpu: Evicting context 1291_0 slots: 0x01 0:10:07: [ 264.254933] mali 70000000.gpu: Resetting GPU (allowing up to 500 ms) 0:10:07: [ 264.255022] mali 70000000.gpu: Register state: 0:10:07: [ 264.255119] mali 70000000.gpu: GPU_IRQ_RAWSTAT=0x00040200 GPU_STATUS=0x00000001 MCU_STATUS=0x00000001 0:10:07: [ 264.255238] mali 70000000.gpu: JOB_IRQ_RAWSTAT=0x00000000 MMU_IRQ_RAWSTAT=0x00000000 GPU_FAULTSTATUS=0x00000000 0:10:07: [ 264.255365] mali 70000000.gpu: GPU_IRQ_MASK=0x00000000 JOB_IRQ_MASK=0x00000000 MMU_IRQ_MASK=0x00000000 0:10:07: [ 264.255476] mali 70000000.gpu: PWR_OVERRIDE0=0x00000000 PWR_OVERRIDE1=0x00000000 0:10:07: [ 264.255583] mali 70000000.gpu: SHADER_CONFIG=0x00000000 L2_MMU_CONFIG=0x00000000 TILER_CONFIG=0x00000000 0:10:07: Error: An error occurred attempting to execute a workload: CL error: clFlush. Error code: -36 at function Execute [/devenv/armnn/src/backends/cl/workloads/ClFullyConnectedWorkload.cpp:110] 0:10:07: Info: Execution time: 34094.88 ms. 0:10:07: terminate called after throwing an instance of 'armnn::Exception' 0:10:07: what(): IRuntime::EnqueueWorkload failed 0:10:08: [ 264.755764] mali 70000000.gpu: Failed to soft-reset GPU (timed out after 500 ms), now attempting a hard reset 0:10:08: [ 264.756624] mali 70000000.gpu: reloading firmware 0:10:08: [ 264.835837] mali 70000000.gpu: Reset complete 0:10:10: Aborted

Kindly provide ways to resolve this issue. Any directions to find the root cause of the issue is highly appreciated.

somasundaram1702 commented 3 weeks ago

Hello Team, I am waiting for a response for a long time. Kindly check and expedite the process.

morgolock commented 3 weeks ago

Hi @somasundaram1702

Apologies for the late reply.

Please try with the latest release of ArmNN 24.05

LD_LIBRARY_PATH=.:./delegate/:$LD_LIBRARY_PATH ./ExecuteNetwork  -m ../MobileNet.armnn     -c GpuAcc -N                                                                                                    <
WARNING: linker: Warning: unable to normalize "./delegate/" (ignoring)
Warning: No input files provided, input tensors will be filled with 0s.
Info: ArmNN v33.1.0
Info: Initialization time: 36.24 ms.
Info: Optimization time: 7.91 ms

Warning: The input data was generated, note that the output will not be useful
Info: Printing outputs to console is disabled.
===== Network Info =====
Inputs in order:
InputLayer, [1,3,224,224], Float32
Outputs in order:
OutPutLayer, [1,1000], Float32

Info: Inferences began at: (9045813997505767 ns) Sun Dec 31 17:43:56 2023

Info: Execution time: 90.88 ms.
Info: Inference time: 90.98 ms

Info: Inferences ended at: (9045814088730621 ns) Sun Dec 31 17:43:56 2023

Info: Shutdown time: 47.98 ms.
shiba:/data/local/tmp/user/armnn_24_05 $ 

As you can see above the model runs okay. Hope this helps

somasundaram1702 commented 3 weeks ago

@morgolock : latest release of ArmNN 24.05 does not provide prebuild binaries for linux, how to try ?

morgolock commented 3 weeks ago

Hi @somasundaram1702

Please try with https://github.com/ARM-software/armnn/releases/download/v24.05/MULTI_ISA-GCC11-ArmNN+ACL-linux-armv8a.tar.gz

Hope this helps