ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.2k stars 315 forks source link

IDeviceSpec { supportedBackends: [CpuAcc, CpuRef]} Gpu Not recognised #704

Closed StuartIanNaylor closed 1 year ago

StuartIanNaylor commented 2 years ago

Hi I thought I would have a go at https://developer.arm.com/documentation/102603/2108/Device-specific-installation/Install-on-Odroid-N2-Plus with an RK3588 Rock5b

If I do a clinfo

clinfo
Number of platforms                               1
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             ARM

  Platform Name                                   ARM Platform
Number of devices                                 1
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '5'.
  Device Name                                     Mali-LODX r0p0
  Device Vendor                                   ARM
  Device Vendor ID                                0xa8670000
  Device Version                                  OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1
  Driver Version                                  2.1
  Device OpenCL C Version                         OpenCL C 2.0 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               4
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple              16
  Max sub-groups per work group                   64
  Preferred / native vector sizes
    char                                                16 / 4
    short                                                8 / 2
    int                                                  4 / 1
    long                                                 2 / 1
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3914665984 (3.646GiB)
  Error Correction support                        No
  Max memory allocation                           3914665984 (3.646GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        1048576 (1024KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     128
  Max constant buffer size                        3914665984 (3.646GiB)
  Max size of kernel argument                     1024
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.0
    SPIR versions                                 <printDeviceInfo:161: get CL_DEVICE_SPIR_VERSIONS size : error -30>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-LODX r0p0
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-LODX r0p0
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-LODX r0p0

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

Anyone any idea why its not finding the GPU? As it seems to be correct with OpenCl?

time python3 run_audio_file.py --audio_file_path tests/testdata/quick_brown_fox_16000khz.wav --model_file_path tflite_int8/wav2letter_int8.tflite  --preferred_backends GpuAcc CpuAcc CpuRef
Your ArmNN library instance does not support Onnx models parser functionality.  Skipped IOnnxParser import.
Preferred backends: ['GpuAcc', 'CpuAcc', 'CpuRef']
IDeviceSpec { supportedBackends: [CpuAcc, CpuRef]}
Optimization warnings: ()
Processing Audio Frames...
the quick brown fox juhmpe over the llazy dag

real    0m2.574s
user    0m7.449s
sys     0m0.310s

Ps the tutorial could do with updating as the labels are in the code and not part of the command line.

FrancisMurtagh-arm commented 2 years ago

Hi @StuartIanNaylor,

The GpuAcc backend uses dlopen() to find one of the below Shared Libraries: static const std::vector<std::string> libraries{ "libOpenCL.so", "libGLES_mali.so", "libmali.so" };

Can you see does your device have any of them in it's system libraries such as in /usr/lib/aarch64-linux-gnu/ or in LD_LIBRARY_PATH?

This is usually the reason why the GPU isn't recognized, although I would have expected an error like:

Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory
Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory

Could you also check that the following shared libraries were installed by the Debian package?

/usr/lib/aarch64-linux-gnu/libarm_compute.so 
/usr/lib/aarch64-linux-gnu/armnn*/Arm_GpuAcc_backend.so

Note: the wildcard * is to account for the version number of ArmNN.

Regards, Francis.

StuartIanNaylor commented 2 years ago

Hi Francis the sym link is libarm_compute.so.28 to libarm_compute.so.28.0.0 not libarm_compute.so /usr/lib/aarch64-linux-gnu/armnn30/Arm_CpuAcc_backend.so exist but no Arm_GpuAcc_backend.so

Also libarmnn-latest-all no longer exists even though it was just earlier I installed

Seems to be now

armnn-latest-cpu - Arm NN is an inference engine for CPUs, GPUs and NPUs
armnn-latest-cpu-gpu - Arm NN is an inference engine for CPUs, GPUs and NPUs
armnn-latest-cpu-gpu-ref - Arm NN is an inference engine for CPUs, GPUs and NPUs
armnn-latest-gpu - Arm NN is an inference engine for CPUs, GPUs and NPUs

Doh ignore me its armnn-latest-all

I just did a reinstall its Arm_GpuAcc_backend.so is there

 time python3 run_audio_file.py --audio_file_path tests/testdata/quick_brown_fox_16000khz.wav --model_file_path tflite_int8/wav2letter_int8.tflite  --preferred_backends GpuAcc CpuAcc CpuRef
Your ArmNN library instance does not support Onnx models parser functionality.  Skipped IOnnxParser import.
Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '5'.
Preferred backends: ['GpuAcc', 'CpuAcc', 'CpuRef']
IDeviceSpec { supportedBackends: [CpuAcc, CpuRef, GpuAcc]}
Optimization warnings: ()
Processing Audio Frames...
the quick brown fox juhmpe over the llazy dag

real    0m8.132s
user    0m12.390s
sys     0m0.417s

Many thanks Francis as super quick.

ps

Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory

Is that correct?

PS via CPU its

the quick brown fox juhmpe over the llazy dag

real    0m2.998s
user    0m8.936s
sys     0m0.345s

Is that what you would expect with a G610mp4?

FrancisMurtagh-arm commented 2 years ago

Hi @StuartIanNaylor,

Yes sorry, armnn-latest-all is a virtual package name that's meant to be a bit easier on the eye than having lib... + ABI version number.

Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory

Seeing this is expected as it searches the list I mentioned above but only requires at least one of them. Different boards might have any combination of the three.

Thanks for trying it out. Francis.

StuartIanNaylor commented 2 years ago

Yeah just did 'sudo ln /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/aarch64-linux-gnu/libOpenCL.so' as the link was libOpenCL.so.1

time python3 run_audio_file.py --audio_file_path tests/testdata/quick_brown_fox_16000khz.wav --model_file_path tflite_int8/wav2letter_int8.tflite  --preferred_backends GpuAcc CpuAcc
Your ArmNN library instance does not support Onnx models parser functionality.  Skipped IOnnxParser import.
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '5'.
Preferred backends: ['GpuAcc', 'CpuAcc']
IDeviceSpec { supportedBackends: [CpuAcc, CpuRef, GpuAcc]}
Optimization warnings: ()
Processing Audio Frames...
the quick brown fox juhmpe over the llazy dag

real    0m8.323s
user    0m13.841s
sys     0m0.373s

So they where just nags anyway.

Would you expect with just a G610MP4 as been really impressed with the quad core A76+ quad core A55 but wasn't expecting the G610 to be x2.7 slower. It seems to stall for some time as it initialises and wish I had a bigger model to test. I will find a longer wav sample and give it a go.

FrancisMurtagh-arm commented 2 years ago

Hi @StuartIanNaylor,

I wouldn't know the expected inference time for that example but I can try to find out.

Is this your board? https://wiki.radxa.com/Rock5/hardware/5b

Quad-core ARM Cortex-A76 MPCore processor and quad-core ARM Cortex-A55 MPCore processor + Mali-G610 MP4 3D GPU

Can you give a link to the wav2letter model you used, just to make sure I use the same.

Thanks, Francis.

StuartIanNaylor commented 2 years ago

Yeah bigger wav and the G610 starts to play catch up but there is a big initialisation delay.

Yep https://wiki.radxa.com/Rock5/hardware/5b was really interested to see if maybe I could partition a model and use both CPU/GPU in tandem but things not looking so good

time python3 run_audio_file.py --audio_file_path gb0.wav --model_file_path tflite_int8/wav2letter_int8.tflite  --preferred_backends CpuAcc CpuAcc                               Your ArmNN library instance does not support Onnx models parser functionality.  Skipped IOnnxParser import.
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '5'.
Preferred backends: ['CpuAcc', 'CpuAcc']
IDeviceSpec { supportedBackends: [CpuAcc, CpuRef, GpuAcc]}
Optimization warnings: ()
Processing Audio Frames...
 morning this teuesidy is a election day after much ovspiritid banevigors cam paigning the time has come from americas to make importaant d ecisions ta matter nation's future and courage al americans to go tothe pols and vote lesct oin ses in brings oapvt e  s pbird of competitionn between oer bpolitical parties  in that competitionn is in a ssential ppart of a healthy doemarcracy but as the campanes come to  a clothes were publicaanis demogrants ind inddependence ccan find common ground on leaset one  point our system of reppresented democracy is one of  americaus greates strength he nied staye  was fantid on the belief that all men re greet y equal every election daay meionss of americans o ae rraces religions anmbagarouom step ind efmoting boose through outd the nation whether the a richrrd poor olver young each oof them as an eqqual shar and choosing the path an our country will take  n every baweaed they cast  is reminder that are founding principlles for alive and weal boading his wene a great brilges of ammericans citizenshipp and ris  always required braave defenders as you head of the polse next week  remember the sacriffices that had been made by generatiions of americans in uniform tto preserve eor way off life from buacker hgald o bagdad the men and women of americann on forces had been devoted guardians of hoar democracy all of au sow them in ther familes a sspecial donae ratituude on aelection dvay merica shuld alson remember the importning  example that ar eletion said throughout the woorld  youngermocracyes from georgianiu grained of nanistdane in a rack and looked the unianed states or proof this sholve ggovernment can enddoure and nationss an still af under tyyornyn oppression con fiund hope in his viration in oaur commitment de liberty for more than two centuries a americans hof demmonstrated the ability of free people to chhoese their own leaders or nationn hais flourihed becauuse of ith commentdement to tresting the wisdom off our citizenry ind thes yeaars election  we will see thi traditiion continued and we will bee reminded once againn that we are blessed o live in a free nationn guided by the will of the people thank you frliishning

real    0m42.802s
user    2m58.349s
sys     0m0.540s
(venv) rock@rock-5b:~/workspace/armnn/python/pyarmnn/examples/speech_recognition$ time python3 run_audio_file.py --audio_file_path gb0.wav --model_file_path tflite_int8/wav2letter_int8.tflite  --preferred_backends GpuAcc CpuAcc
Your ArmNN library instance does not support Onnx models parser functionality.  Skipped IOnnxParser import.
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '5'.
Preferred backends: ['GpuAcc', 'CpuAcc']
IDeviceSpec { supportedBackends: [CpuAcc, CpuRef, GpuAcc]}
Optimization warnings: ()
Processing Audio Frames...
 morning this teuesidy is a election day after much ovspiritid banevigors cam paigning the time has come from americas to make importaant d ecisions ta matter nation's future and courage al americans to go tothe pols and vote lesct oin ses in brings oapvt e  s pbird of competitionn between oer bpolitical parties  in that competitionn is in a ssential ppart of a healthy doemarcracy but as the campanes come to  a clothes were publicaanis demogrants ind inddependence ccan find common ground on leaset one  point our system of reppresented democracy is one of  americaus greates strength he nied staye  was fantid on the belief that all men re greet y equal every election daay meionss of americans o ae rraces religions anmbagarouom step ind efmoting boose through outd the nation whether the a richrrd poor olver young each oof them as an eqqual shar and choosing the path an our country will take  n every baweaed they cast  is reminder that are founding principlles for alive and weal boading his wene a great brilges of ammericans citizenshipp and ris  always required braave defenders as you head of the polse next week  remember the sacriffices that had been made by generatiions of americans in uniform tto preserve eor way off life from buacker hgald o bagdad the men and women of americann on forces had been devoted guardians of hoar democracy all of au sow them in ther familes a sspecial donae ratituude on aelection dvay merica shuld alson remember the importning  example that ar eletion said throughout the woorld  youngermocracyes from georgianiu grained of nanistdane in a rack and looked the unianed states or proof this sholve ggovernment can enddoure and nationss an still af under tyyornyn oppression con fiund hope in his viration in oaur commitment de liberty for more than two centuries a americans hof demmonstrated the ability of free people to chhoese their own leaders or nationn hais flourihed becauuse of ith commentdement to tresting the wisdom off our citizenry ind thes yeaars election  we will see thi traditiion continued and we will bee reminded once againn that we are blessed o live in a free nationn guided by the will of the people thank you frliishning

real    0m46.379s
user    2m39.762s
sys     0m0.648s

Its

git clone https://github.com/ARM-software/ML-zoo
Copy the model file to the example application with the following commands:

cd armnn/python/pyarmnn/examples/speech_recognition
cp -r ~/workspace/ML-zoo/models/speech_recognition/wav2letter/tflite_int8 .

From the tutorial the results are hilarious but great for just benching like this.

From watching there is about a 5sec delay with GPU that CPU for some reason would that make sense as does it have to create a 'framebuffer' in memory that is recreated everytime similar to the 1st load into vram of a gpu, but its created each time with the Mali?

StuartIanNaylor commented 2 years ago

Yeah things are definitely not what I expected as did a bit of a hack to the code to keep the network loaded for a 2nd run. For some reason with GPU it is loading the network takes about 5secs as opposed to CPU but also what surprised is how little work the GPU does as the overall load is only approx 5% avg less.

# Copyright © 2021 Arm Ltd and Contributors. All rights reserved.
# SPDX-License-Identifier: MIT

"""Automatic speech recognition with PyArmNN demo for processing audio clips to text."""

import sys
import os
import numpy as np
import psutil
script_dir = os.path.dirname(__file__)
sys.path.insert(1, os.path.join(script_dir, '..', 'common'))

from argparse import ArgumentParser
from network_executor import ArmnnNetworkExecutor
from utils import prepare_input_data
from audio_capture import AudioCaptureParams, capture_audio
from audio_utils import decode_text, display_text
from wav2letter_mfcc import Wav2LetterMFCC, W2LAudioPreprocessor
from mfcc import MFCCParams
from datetime import datetime

# Model Specific Labels
labels = {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j', 10: 'k', 11: 'l', 12: 'm',
          13: 'n',
          14: 'o', 15: 'p', 16: 'q', 17: 'r', 18: 's', 19: 't', 20: 'u', 21: 'v', 22: 'w', 23: 'x', 24: 'y',
          25: 'z',
          26: "'", 27: ' ', 28: '$'}

def parse_args():
    parser = ArgumentParser(description="ASR with PyArmNN")
    parser.add_argument(
        "--audio_file_path",
        required=True,
        type=str,
        help="Path to the audio file to perform ASR",
    )
    parser.add_argument(
        "--model_file_path",
        required=True,
        type=str,
        help="Path to ASR model to use",
    )
    parser.add_argument(
        "--preferred_backends",
        type=str,
        nargs="+",
        default=["GpuAcc", "CpuAcc", "CpuRef"],
        help="""List of backends in order of preference for optimizing
        subgraphs, falling back to the next backend in the list on unsupported
        layers. Defaults to [GpuAcc, CpuAcc, CpuRef]""",
    )
    return parser.parse_args()

def main(args, network):
    # Read command line args
    audio_file = args.audio_file_path
    print(datetime.now() - starttime, psutil.cpu_percent())

    print(datetime.now() - starttime, psutil.cpu_percent())
    # Specify model specific audio data requirements
    audio_capture_params = AudioCaptureParams(dtype=np.float32, overlap=31712, min_samples=47712, sampling_freq=16000,
                                              mono=True)

    buffer = capture_audio(audio_file, audio_capture_params)
    print(datetime.now() - starttime, psutil.cpu_percent())
    # Extract features and create the preprocessor

    mfcc_params = MFCCParams(sampling_freq=16000, num_fbank_bins=128, mel_lo_freq=0, mel_hi_freq=8000,
                             num_mfcc_feats=13, frame_len=512, use_htk_method=False, n_fft=512)

    print(datetime.now() - starttime, psutil.cpu_percent())
    wmfcc = Wav2LetterMFCC(mfcc_params)
    preprocessor = W2LAudioPreprocessor(wmfcc, model_input_size=296, stride=160)
    current_r_context = ""
    is_first_window = True

    print("Processing Audio Frames...")
    for audio_data in buffer:
        # Prepare the input Tensors
        input_data = prepare_input_data(audio_data, network.get_data_type(), network.get_input_quantization_scale(0),
                                        network.get_input_quantization_offset(0), preprocessor)

        # Run inference
        output_result = network.run([input_data])

        # Slice and Decode the text, and store the right context
        current_r_context, text = decode_text(is_first_window, labels, output_result)

        is_first_window = False

        display_text(text)
        print(datetime.now() - starttime, psutil.cpu_percent())

    print(current_r_context, flush=True)
    print(datetime.now() - starttime, psutil.cpu_percent())
    print("Inference End", psutil.cpu_percent())

if __name__ == "__main__":
    args = parse_args()
    print("Inference Start", psutil.cpu_percent())
    starttime = datetime.now()
    # Create the ArmNN inference runner
    network = ArmnnNetworkExecutor(args.model_file_path, args.preferred_backends)
    print(datetime.now() - starttime, psutil.cpu_percent())
    main(args, network)
    starttime = datetime.now()
    print(datetime.now() - starttime, psutil.cpu_percent())
    main(args, network)

@FrancisMurtagh-arm

StuartIanNaylor commented 2 years ago
./test_computeinfo
 Initializing random seed to 0.
Requesting Default device based on command line for platform index 0 and device index 0
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '5'.
Compute Device Name = Mali-LODX r0p0, Compute Device Vendor = ARM, Compute Device Version = OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1, CL C Version = OpenCL C 2.0 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1
Supports single precision denormals: YES
sizeof( void*) = 8  (host)
sizeof( void*) = 8  (device)
computeinfo...

clGetPlatformInfo:
------------------
        CL_PLATFORM_VERSION == "OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_PLATFORM_PROFILE == "FULL_PROFILE"
        CL_PLATFORM_NAME == "ARM Platform"
        CL_PLATFORM_VENDOR == "ARM"
        CL_PLATFORM_EXTENSIONS == "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl"
        Skipped: CL_PLATFORM_EXTENSIONS_WITH_VERSION.
        Skipped: CL_PLATFORM_NUMERIC_VERSION.

Getting device IDs for CL_DEVICE_TYPE_DEFAULT devices
Found 1 CL_DEVICE_TYPE_DEFAULT devices:
CL_DEVICE_TYPE_DEFAULT Device 1 of 1 Info:
        CL_DEVICE_VERSION == "OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_DEVICE_EXTENSIONS == "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl"
        CL_DEVICE_TYPE == |CL_DEVICE_TYPE_GPU||
        CL_DEVICE_VENDOR_ID == 2825322496
        CL_DEVICE_MAX_COMPUTE_UNITS == 4
        CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS == 3
        CL_DEVICE_MAX_WORK_ITEM_SIZES == 1024 1024 1024
        CL_DEVICE_MAX_WORK_GROUP_SIZE == 1024
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR == 16
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT == 8
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT == 4
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG == 2
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT == 4
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE == 0
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF == 8
        CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR == 4
        CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT == 2
        CL_DEVICE_NATIVE_VECTOR_WIDTH_INT == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE == 0
        CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF == 2
        CL_DEVICE_MAX_CLOCK_FREQUENCY == 1000
        CL_DEVICE_ADDRESS_BITS == 64
        CL_DEVICE_MAX_READ_IMAGE_ARGS == 128
        CL_DEVICE_MAX_WRITE_IMAGE_ARGS == 64
        CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS == 64
        CL_DEVICE_MAX_MEM_ALLOC_SIZE == 3914739712
        CL_DEVICE_IMAGE2D_MAX_WIDTH == 65536
        CL_DEVICE_IMAGE2D_MAX_HEIGHT == 65536
        CL_DEVICE_IMAGE3D_MAX_WIDTH == 65536
        CL_DEVICE_IMAGE3D_MAX_HEIGHT == 65536
        CL_DEVICE_IMAGE3D_MAX_DEPTH == 65536
        CL_DEVICE_IMAGE_MAX_ARRAY_SIZE == 2048
        CL_DEVICE_IMAGE_MAX_BUFFER_SIZE == 65536
        CL_DEVICE_IMAGE_SUPPORT == 1
        CL_DEVICE_MAX_PARAMETER_SIZE == 1024
        CL_DEVICE_MAX_SAMPLERS == 16
        CL_DEVICE_IMAGE_PITCH_ALIGNMENT == 64
        CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT == 32
        CL_DEVICE_MEM_BASE_ADDR_ALIGN == 1024
        CL_DEVICE_SINGLE_FP_CONFIG == CL_FP_DENORM|CL_FP_INF_NAN|CL_FP_ROUND_TO_NEAREST|CL_FP_ROUND_TO_ZERO|CL_FP_ROUND_TO_INF|CL_FP_FMA|
        CL_DEVICE_DOUBLE_FP_CONFIG == ||||||
        CL_DEVICE_GLOBAL_MEM_CACHE_TYPE == CL_READ_WRITE_CACHE
        CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE == 64
        CL_DEVICE_GLOBAL_MEM_CACHE_SIZE == 1048576
        CL_DEVICE_GLOBAL_MEM_SIZE == 3914739712
        CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE == 3914739712
        CL_DEVICE_MAX_CONSTANT_ARGS == 128
        CL_DEVICE_LOCAL_MEM_TYPE == CL_GLOBAL
        CL_DEVICE_LOCAL_MEM_SIZE == 32768
        CL_DEVICE_ERROR_CORRECTION_SUPPORT == 0
        CL_DEVICE_HOST_UNIFIED_MEMORY == 1
        CL_DEVICE_PROFILING_TIMER_RESOLUTION == 1000
        CL_DEVICE_ENDIAN_LITTLE == 1
        CL_DEVICE_AVAILABLE == 1
        CL_DEVICE_COMPILER_AVAILABLE == 1
        CL_DEVICE_LINKER_AVAILABLE == 1
        CL_DEVICE_BUILT_IN_KERNELS == ""
        CL_DEVICE_PRINTF_BUFFER_SIZE == 1048576
        CL_DEVICE_PREFERRED_INTEROP_USER_SYNC == 0
        CL_DEVICE_PARENT_DEVICE == 0
        CL_DEVICE_PARTITION_MAX_SUB_DEVICES == 0
        CL_DEVICE_PARTITION_AFFINITY_DOMAIN == |||||
        CL_DEVICE_REFERENCE_COUNT == 1
        CL_DEVICE_EXECUTION_CAPABILITIES == CL_EXEC_KERNEL|
        CL_DEVICE_QUEUE_ON_HOST_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_NAME == "Mali-LODX r0p0"
        CL_DEVICE_VENDOR == "ARM"
        CL_DRIVER_VERSION == "2.1"
        CL_DEVICE_PROFILE == "FULL_PROFILE"
        CL_DEVICE_OPENCL_C_VERSION == "OpenCL C 2.0 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_DEVICE_MAX_PIPE_ARGS == 16
        CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS == 1
        CL_DEVICE_PIPE_MAX_PACKET_SIZE == 1024
        CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE == 65536
        CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE == 0
        CL_DEVICE_QUEUE_ON_HOST_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE == 2097152
        CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE == 16777216
        CL_DEVICE_MAX_ON_DEVICE_QUEUES == 1
        CL_DEVICE_MAX_ON_DEVICE_EVENTS == 1024
        CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_SVM_CAPABILITIES == CL_DEVICE_SVM_COARSE_GRAIN_BUFFER|||
        CL_DEVICE_IL_VERSION == "SPIR-V_1.0"
        CL_DEVICE_MAX_NUM_SUB_GROUPS == 64
        CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS == 1
        Skipped: CL_DEVICE_ATOMIC_MEMORY_CAPABILITIES.
        Skipped: CL_DEVICE_ATOMIC_FENCE_CAPABILITIES.
        Skipped: CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT.
        Skipped: CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
        Skipped: CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT.
        Skipped: CL_DEVICE_GENERIC_ADDRESS_SPACE_SUPPORT.
        Skipped: CL_DEVICE_OPENCL_C_FEATURES.
        Skipped: CL_DEVICE_DEVICE_ENQUEUE_CAPABILITIES.
        Skipped: CL_DEVICE_PIPE_SUPPORT.
        Skipped: CL_DEVICE_NUMERIC_VERSION.
        Skipped: CL_DEVICE_EXTENSIONS_WITH_VERSION.
        Skipped: CL_DEVICE_OPENCL_C_ALL_VERSIONS.
        Skipped: CL_DEVICE_ILS_WITH_VERSION.
        Skipped: CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION.
        CL_DEVICE_IMAGE_PITCH_ALIGNMENT == 64
        CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT == 32
        CL_MEM_OBJECT_IMAGE1D supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE1D_BUFFER supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE2D supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE3D supported formats:
                CL_MEM_READ_ONLY: 58 supported formats
                CL_MEM_WRITE_ONLY: 65 supported formats
                CL_MEM_READ_WRITE: 65 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 65 supported formats
        CL_MEM_OBJECT_IMAGE1D_ARRAY supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE2D_ARRAY supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats

Getting device IDs for CL_DEVICE_TYPE_CPU devices
No devices of type CL_DEVICE_TYPE_CPU found.
Getting device IDs for CL_DEVICE_TYPE_GPU devices
Found 1 CL_DEVICE_TYPE_GPU devices:
CL_DEVICE_TYPE_GPU Device 1 of 1 Info:
        CL_DEVICE_VERSION == "OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_DEVICE_EXTENSIONS == "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl"
        CL_DEVICE_TYPE == |CL_DEVICE_TYPE_GPU||
        CL_DEVICE_VENDOR_ID == 2825322496
        CL_DEVICE_MAX_COMPUTE_UNITS == 4
        CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS == 3
        CL_DEVICE_MAX_WORK_ITEM_SIZES == 1024 1024 1024
        CL_DEVICE_MAX_WORK_GROUP_SIZE == 1024
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR == 16
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT == 8
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT == 4
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG == 2
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT == 4
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE == 0
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF == 8
        CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR == 4
        CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT == 2
        CL_DEVICE_NATIVE_VECTOR_WIDTH_INT == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE == 0
        CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF == 2
        CL_DEVICE_MAX_CLOCK_FREQUENCY == 1000
        CL_DEVICE_ADDRESS_BITS == 64
        CL_DEVICE_MAX_READ_IMAGE_ARGS == 128
        CL_DEVICE_MAX_WRITE_IMAGE_ARGS == 64
        CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS == 64
        CL_DEVICE_MAX_MEM_ALLOC_SIZE == 3914739712
        CL_DEVICE_IMAGE2D_MAX_WIDTH == 65536
        CL_DEVICE_IMAGE2D_MAX_HEIGHT == 65536
        CL_DEVICE_IMAGE3D_MAX_WIDTH == 65536
        CL_DEVICE_IMAGE3D_MAX_HEIGHT == 65536
        CL_DEVICE_IMAGE3D_MAX_DEPTH == 65536
        CL_DEVICE_IMAGE_MAX_ARRAY_SIZE == 2048
        CL_DEVICE_IMAGE_MAX_BUFFER_SIZE == 65536
        CL_DEVICE_IMAGE_SUPPORT == 1
        CL_DEVICE_MAX_PARAMETER_SIZE == 1024
        CL_DEVICE_MAX_SAMPLERS == 16
        CL_DEVICE_IMAGE_PITCH_ALIGNMENT == 64
        CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT == 32
        CL_DEVICE_MEM_BASE_ADDR_ALIGN == 1024
        CL_DEVICE_SINGLE_FP_CONFIG == CL_FP_DENORM|CL_FP_INF_NAN|CL_FP_ROUND_TO_NEAREST|CL_FP_ROUND_TO_ZERO|CL_FP_ROUND_TO_INF|CL_FP_FMA|
        CL_DEVICE_DOUBLE_FP_CONFIG == ||||||
        CL_DEVICE_GLOBAL_MEM_CACHE_TYPE == CL_READ_WRITE_CACHE
        CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE == 64
        CL_DEVICE_GLOBAL_MEM_CACHE_SIZE == 1048576
        CL_DEVICE_GLOBAL_MEM_SIZE == 3914739712
        CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE == 3914739712
        CL_DEVICE_MAX_CONSTANT_ARGS == 128
        CL_DEVICE_LOCAL_MEM_TYPE == CL_GLOBAL
        CL_DEVICE_LOCAL_MEM_SIZE == 32768
        CL_DEVICE_ERROR_CORRECTION_SUPPORT == 0
        CL_DEVICE_HOST_UNIFIED_MEMORY == 1
        CL_DEVICE_PROFILING_TIMER_RESOLUTION == 1000
        CL_DEVICE_ENDIAN_LITTLE == 1
        CL_DEVICE_AVAILABLE == 1
        CL_DEVICE_COMPILER_AVAILABLE == 1
        CL_DEVICE_LINKER_AVAILABLE == 1
        CL_DEVICE_BUILT_IN_KERNELS == ""
        CL_DEVICE_PRINTF_BUFFER_SIZE == 1048576
        CL_DEVICE_PREFERRED_INTEROP_USER_SYNC == 0
        CL_DEVICE_PARENT_DEVICE == 0
        CL_DEVICE_PARTITION_MAX_SUB_DEVICES == 0
        CL_DEVICE_PARTITION_AFFINITY_DOMAIN == |||||
        CL_DEVICE_REFERENCE_COUNT == 1
        CL_DEVICE_EXECUTION_CAPABILITIES == CL_EXEC_KERNEL|
        CL_DEVICE_QUEUE_ON_HOST_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_NAME == "Mali-LODX r0p0"
        CL_DEVICE_VENDOR == "ARM"
        CL_DRIVER_VERSION == "2.1"
        CL_DEVICE_PROFILE == "FULL_PROFILE"
        CL_DEVICE_OPENCL_C_VERSION == "OpenCL C 2.0 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_DEVICE_MAX_PIPE_ARGS == 16
        CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS == 1
        CL_DEVICE_PIPE_MAX_PACKET_SIZE == 1024
        CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE == 65536
        CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE == 0
        CL_DEVICE_QUEUE_ON_HOST_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE == 2097152
        CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE == 16777216
        CL_DEVICE_MAX_ON_DEVICE_QUEUES == 1
        CL_DEVICE_MAX_ON_DEVICE_EVENTS == 1024
        CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_SVM_CAPABILITIES == CL_DEVICE_SVM_COARSE_GRAIN_BUFFER|||
        CL_DEVICE_IL_VERSION == "SPIR-V_1.0"
        CL_DEVICE_MAX_NUM_SUB_GROUPS == 64
        CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS == 1
        Skipped: CL_DEVICE_ATOMIC_MEMORY_CAPABILITIES.
        Skipped: CL_DEVICE_ATOMIC_FENCE_CAPABILITIES.
        Skipped: CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT.
        Skipped: CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
        Skipped: CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT.
        Skipped: CL_DEVICE_GENERIC_ADDRESS_SPACE_SUPPORT.
        Skipped: CL_DEVICE_OPENCL_C_FEATURES.
        Skipped: CL_DEVICE_DEVICE_ENQUEUE_CAPABILITIES.
        Skipped: CL_DEVICE_PIPE_SUPPORT.
        Skipped: CL_DEVICE_NUMERIC_VERSION.
        Skipped: CL_DEVICE_EXTENSIONS_WITH_VERSION.
        Skipped: CL_DEVICE_OPENCL_C_ALL_VERSIONS.
        Skipped: CL_DEVICE_ILS_WITH_VERSION.
        Skipped: CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION.
        CL_DEVICE_IMAGE_PITCH_ALIGNMENT == 64
        CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT == 32
        CL_MEM_OBJECT_IMAGE1D supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE1D_BUFFER supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE2D supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE3D supported formats:
                CL_MEM_READ_ONLY: 58 supported formats
                CL_MEM_WRITE_ONLY: 65 supported formats
                CL_MEM_READ_WRITE: 65 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 65 supported formats
        CL_MEM_OBJECT_IMAGE1D_ARRAY supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE2D_ARRAY supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats

Getting device IDs for CL_DEVICE_TYPE_ACCELERATOR devices
No devices of type CL_DEVICE_TYPE_ACCELERATOR found.
Getting device IDs for CL_DEVICE_TYPE_ALL devices
Found 1 CL_DEVICE_TYPE_ALL devices:
CL_DEVICE_TYPE_ALL Device 1 of 1 Info:
        CL_DEVICE_VERSION == "OpenCL 2.1 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_DEVICE_EXTENSIONS == "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl"
        CL_DEVICE_TYPE == |CL_DEVICE_TYPE_GPU||
        CL_DEVICE_VENDOR_ID == 2825322496
        CL_DEVICE_MAX_COMPUTE_UNITS == 4
        CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS == 3
        CL_DEVICE_MAX_WORK_ITEM_SIZES == 1024 1024 1024
        CL_DEVICE_MAX_WORK_GROUP_SIZE == 1024
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR == 16
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT == 8
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT == 4
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG == 2
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT == 4
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE == 0
        CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF == 8
        CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR == 4
        CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT == 2
        CL_DEVICE_NATIVE_VECTOR_WIDTH_INT == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT == 1
        CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE == 0
        CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF == 2
        CL_DEVICE_MAX_CLOCK_FREQUENCY == 1000
        CL_DEVICE_ADDRESS_BITS == 64
        CL_DEVICE_MAX_READ_IMAGE_ARGS == 128
        CL_DEVICE_MAX_WRITE_IMAGE_ARGS == 64
        CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS == 64
        CL_DEVICE_MAX_MEM_ALLOC_SIZE == 3914739712
        CL_DEVICE_IMAGE2D_MAX_WIDTH == 65536
        CL_DEVICE_IMAGE2D_MAX_HEIGHT == 65536
        CL_DEVICE_IMAGE3D_MAX_WIDTH == 65536
        CL_DEVICE_IMAGE3D_MAX_HEIGHT == 65536
        CL_DEVICE_IMAGE3D_MAX_DEPTH == 65536
        CL_DEVICE_IMAGE_MAX_ARRAY_SIZE == 2048
        CL_DEVICE_IMAGE_MAX_BUFFER_SIZE == 65536
        CL_DEVICE_IMAGE_SUPPORT == 1
        CL_DEVICE_MAX_PARAMETER_SIZE == 1024
        CL_DEVICE_MAX_SAMPLERS == 16
        CL_DEVICE_IMAGE_PITCH_ALIGNMENT == 64
        CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT == 32
        CL_DEVICE_MEM_BASE_ADDR_ALIGN == 1024
        CL_DEVICE_SINGLE_FP_CONFIG == CL_FP_DENORM|CL_FP_INF_NAN|CL_FP_ROUND_TO_NEAREST|CL_FP_ROUND_TO_ZERO|CL_FP_ROUND_TO_INF|CL_FP_FMA|
        CL_DEVICE_DOUBLE_FP_CONFIG == ||||||
        CL_DEVICE_GLOBAL_MEM_CACHE_TYPE == CL_READ_WRITE_CACHE
        CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE == 64
        CL_DEVICE_GLOBAL_MEM_CACHE_SIZE == 1048576
        CL_DEVICE_GLOBAL_MEM_SIZE == 3914739712
        CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE == 3914739712
        CL_DEVICE_MAX_CONSTANT_ARGS == 128
        CL_DEVICE_LOCAL_MEM_TYPE == CL_GLOBAL
        CL_DEVICE_LOCAL_MEM_SIZE == 32768
        CL_DEVICE_ERROR_CORRECTION_SUPPORT == 0
        CL_DEVICE_HOST_UNIFIED_MEMORY == 1
        CL_DEVICE_PROFILING_TIMER_RESOLUTION == 1000
        CL_DEVICE_ENDIAN_LITTLE == 1
        CL_DEVICE_AVAILABLE == 1
        CL_DEVICE_COMPILER_AVAILABLE == 1
        CL_DEVICE_LINKER_AVAILABLE == 1
        CL_DEVICE_BUILT_IN_KERNELS == ""
        CL_DEVICE_PRINTF_BUFFER_SIZE == 1048576
        CL_DEVICE_PREFERRED_INTEROP_USER_SYNC == 0
        CL_DEVICE_PARENT_DEVICE == 0
        CL_DEVICE_PARTITION_MAX_SUB_DEVICES == 0
        CL_DEVICE_PARTITION_AFFINITY_DOMAIN == |||||
        CL_DEVICE_REFERENCE_COUNT == 1
        CL_DEVICE_EXECUTION_CAPABILITIES == CL_EXEC_KERNEL|
        CL_DEVICE_QUEUE_ON_HOST_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_NAME == "Mali-LODX r0p0"
        CL_DEVICE_VENDOR == "ARM"
        CL_DRIVER_VERSION == "2.1"
        CL_DEVICE_PROFILE == "FULL_PROFILE"
        CL_DEVICE_OPENCL_C_VERSION == "OpenCL C 2.0 v1.g6p0-01eac0.efb75e2978d783a80fe78be1bfb0efc1"
        CL_DEVICE_MAX_PIPE_ARGS == 16
        CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS == 1
        CL_DEVICE_PIPE_MAX_PACKET_SIZE == 1024
        CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE == 65536
        CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE == 0
        CL_DEVICE_QUEUE_ON_HOST_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES == CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE|CL_QUEUE_PROFILING_ENABLE
        CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE == 2097152
        CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE == 16777216
        CL_DEVICE_MAX_ON_DEVICE_QUEUES == 1
        CL_DEVICE_MAX_ON_DEVICE_EVENTS == 1024
        CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT == 0
        CL_DEVICE_SVM_CAPABILITIES == CL_DEVICE_SVM_COARSE_GRAIN_BUFFER|||
        CL_DEVICE_IL_VERSION == "SPIR-V_1.0"
        CL_DEVICE_MAX_NUM_SUB_GROUPS == 64
        CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS == 1
        Skipped: CL_DEVICE_ATOMIC_MEMORY_CAPABILITIES.
        Skipped: CL_DEVICE_ATOMIC_FENCE_CAPABILITIES.
        Skipped: CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT.
        Skipped: CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
        Skipped: CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT.
        Skipped: CL_DEVICE_GENERIC_ADDRESS_SPACE_SUPPORT.
        Skipped: CL_DEVICE_OPENCL_C_FEATURES.
        Skipped: CL_DEVICE_DEVICE_ENQUEUE_CAPABILITIES.
        Skipped: CL_DEVICE_PIPE_SUPPORT.
        Skipped: CL_DEVICE_NUMERIC_VERSION.
        Skipped: CL_DEVICE_EXTENSIONS_WITH_VERSION.
        Skipped: CL_DEVICE_OPENCL_C_ALL_VERSIONS.
        Skipped: CL_DEVICE_ILS_WITH_VERSION.
        Skipped: CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION.
        CL_DEVICE_IMAGE_PITCH_ALIGNMENT == 64
        CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT == 32
        CL_MEM_OBJECT_IMAGE1D supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE1D_BUFFER supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE2D supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE3D supported formats:
                CL_MEM_READ_ONLY: 58 supported formats
                CL_MEM_WRITE_ONLY: 65 supported formats
                CL_MEM_READ_WRITE: 65 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 65 supported formats
        CL_MEM_OBJECT_IMAGE1D_ARRAY supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats
        CL_MEM_OBJECT_IMAGE2D_ARRAY supported formats:
                CL_MEM_READ_ONLY: 60 supported formats
                CL_MEM_WRITE_ONLY: 67 supported formats
                CL_MEM_READ_WRITE: 67 supported formats
                CL_MEM_KERNEL_READ_AND_WRITE: 67 supported formats

computeinfo passed
extended_versioning...
Platform versions:
        Matched the platform version
Platform extensions:
        Matched 35 extensions
Device versions:
        Matched the device OpenCL and OpenCL C versions
Device extensions:
        Matched 35 extensions
Device ILs:
        Matched 1 ILs
Device built-in kernels:
        Matched 0 kernels
extended_versioning passed
device_uuid...
        Device UUID: 000067a8010000000000000000000000
        Driver UUID: d9495befea917c528a438a3c2f7b49cc
        Device LUID validity is false
        Device LUID: 0000000000000000
        Node mask  : 00000000
device_uuid passed
conformance_version...
conformance_version skipped (requires at least OpenCL version 3.0, but the device reports OpenCL version 2.1)
pci_bus_info...
cl_khr_pci_bus_info not supported. Skipping test...
pci_bus_info test not supported
PASSED sub-test.
PASSED 3 of 3 tests.
StuartIanNaylor commented 2 years ago

@FrancisMurtagh-arm After doing playing with images the mali drivers we have are obviously not happy as glmark2-es2 on x11 only provides a score in the 600 range on wayland slightly better 700 range. Khadas with there edge2 with a rk3588s on wayland have managed to get a working image with wayland that produces a glmark2 score of 4000 which is somewhere what I was expecting. So I don't think its ArmNN or the results we will get will be any good until a better performing image can be found.

FrancisMurtagh-arm commented 1 year ago

Hi @StuartIanNaylor,

Thanks for doing the analysis on this, so do you think it's just the mali drivers are the issue or is there an issue in ArmNN that can be rectified? re: 5sec delay with GPU

Regards, Francis.

StuartIanNaylor commented 1 year ago

I think its ArmNN as it must create a frame buffer of the model, as it only happens on load. If you do the hacks above of preprocessing and x2 runs then the delay doesn't happen. Its not that much of a problem as you can load a model with a initiate wav as part of a load routine I guess. The GPU when running has a load of 60-70% which is pretty good loading the main problem with the RK3588 is more graphical where the CSF (Command stream Frontend) hasn't really been implemented so it acts as a asynchronous arbiter between cpu & gpu IRQ to push the stream and queue on that instant. Using OpenCL its synchronous and isn't also co-ordinating with the graphical side of the mesa drivers.

The Mali G610 is impressive for ML as the CPU for a small Arm board is no slouch and it easily matches it in fact they make a great pair at this level.

The example code isn't that great unless you hack it as much is cpu preprocessing and can give a false impression as the gpu doesn't really get fully loaded as it waits for input stream. Not sure why the MFCC processing seems so heavy.

FrancisMurtagh-arm commented 1 year ago

Hi @StuartIanNaylor,

I've asked around about the issue and we are aware of a slower inference time on the first run on CL backend.

This is a known issue where it takes a bit extra time due to the kernels for certain layers being compiled. This can also be seen in the android-nn-driver where we perform a dummy run first which does the same thing.

I'll report back if there is any change.

Thanks again for the analysis, Francis.