Closed abhajaswal closed 2 years ago
Dear team,
Could you let me know what could be the root cause? Does the pipline creation takes more time?
I need to review the usage of the ARMNN further , but in case prepare takes lot of time then i need an understanding about it.
Tiime taken by ARMNN CPU to prepare : 992ms ARMNN GPU : 9607ms
Tiime taken by opensource tflite CPU plugin : 44ms
Hi @abhajaswal
Could you please try with the latest release 22.05? There have been some improvements in the startup time since 20.02.
In general, both for CPU and GPU, the first iteration is slower because during this run ACL performs various transformations on the tensors to make sure the memory is accessed in the best way possible. All this additional work is done by the operators in their corresponding ::prepare()
methods. For example look at ClGemmConv2d
: https://github.com/ARM-software/ComputeLibrary/blob/main/src/gpu/cl/operators/ClGemmConv2d.cpp#L617
For the OpenCL backend you also have to add the time to compile the OpenCL kernels at runtime, which occurs during configuration. To mitigate this problem you can save the compiled kernels to disk and restore them at runtime. For more information please see the example: https://github.com/ARM-software/ComputeLibrary/blob/main/examples/cl_cache.cpp
Please also be aware that the use of the opencl tuner in acl can affect startup time too, for more information please see: https://arm-software.github.io/ComputeLibrary/latest/architecture.xhtml#architecture_opencl_tuner
It would be helpful if you could share the complete command you used to run the example.
Thanks ! Using cl_cache.bin i am able to reduce the time to load model from 20612 ms to Init : 1379 ms
After cl_cache.bin restore Image read time (From file or camera) Min: 11 ms Max: 11 ms Avg: 11 ms Image pre-process time Min: 1 ms Max: 1 ms Avg: 1 ms Model inference time Min: 70 ms Max: 70 ms Avg: 70 ms Model init/deinit time Init : 1379 ms Info: Shutdown time: 61.85 ms
initial was at time of 1st time save cl_cache.bin :
------------ PERFORMANCE ------------------ Image read time (From file or camera) Min: 12 ms Max: 12 ms Avg: 12 ms Image pre-process time Min: 1 ms Max: 1 ms Avg: 1 ms Model inference time Min: 69 ms Max: 69 ms Avg: 69 ms Model init/deinit time Init : 20612 ms Info: Shutdown time: 120.91 ms
-rwxr-xr-x 1 root root 2419612 Jan 2 18:24 armnn_clcahae.bin -rw-r--r-- 1 root root 23018392 Jul 8 2022 od_tflite_model.tflite
This .bin file i will have to generate for N number of models , so wont it take up more memory . Could we not reduce load time without cl_cache ?
Actually i tried TFlite GPU delegate , the load time for it is also low, i dint had to generate cl_cache.bin step for it. So i wonder how tflite team optimized the load time and why using ARMNN/ACL i had to do this
Hi @abhajaswal
Glad to hear you improved the load time using prebuilt opencl kernels.
This .bin file i will have to generate for N number of models , so wont it take up more memory .
Yes, you could easily implement deflating/inflating with something like zlib at runtime to reduce the size on disk if that's a concern.
Could we not reduce load time without cl_cache ?
Unfortunately not without a major rework of the library. At runtime the OpenCL kernels need to be compiled and that is what requires the additional time.
Hope this helps.
Hello I try to use the ACL 20.2
As you know when we run any example for ACL in iteration Example mobilenet SSD v1 -> 1st time call for graph_run(0 takes about 1 min
2nd time onwards the graph run takes about 92 ms.
As i understand 1st time ACL creates the pipeline and memory/buffers etc , so it takes time , but is there any way i can reduce the 1st time initialisation time?