ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
2.87k stars 783 forks source link

How to implement my own runtime if I just need a little core kernel ? #1054

Closed wenhyan closed 1 year ago

wenhyan commented 1 year ago

Hi,I want to implement my own runtime for maximum performance, Is there some support? thx.

morgolock commented 1 year ago

Hi @wenhyan

I'd recommend the use of the existing operators in https://github.com/ARM-software/ComputeLibrary/tree/main/src/cpu/operators rather than reimplementing them along with the scheduler.

You could certainly implement your own runtime, we have no guides or resources to help with this but you could use the actual library's source code as a reference. For example you can see how CpuActivation and CPPScheduler are implemented:

What's the motivation for implementing your own runtime? Are you experiencing any performance issues? Could you please share more details?

Hope this helps.

wenhyan commented 1 year ago

Hi @morgolock
Thank you for your help. I want to use ARM CL on hardware without system support (like Linux), also don't support C++ std library and dynamic memory allocation (new or malloc or make_unique ...). Do you have some idea? Thx.

morgolock commented 1 year ago

Hi @wenhyan

We support bare metal builds see https://arm-software.github.io/ComputeLibrary/v23.05/how_to_build.xhtml#S1_5_bare_metal .

Compute Library requires support for -std=c++14.

Can you be more specific about the hardware?

Hope this helps.

wenhyan commented 1 year ago

Hi @morgolock The hardware is a A57 bare metal, but don't support c++14. I think graph and runtime module requires support for c++14. Does the core module also need c++14?

morgolock commented 1 year ago

Hi @wenhyan

Yes, libarm_compute.so requires -std=c++14. Please ignore libarm_compute_core.so since it's been deprecated and will be removed in the near future. See more details in https://review.mlplatform.org/c/ml/ComputeLibrary/+/9760

You can get a toolchain that supports -std=c++14 and the code generated will run on bare metal. You just need to disable openmp=0 cppthreads=0 as explained in the documentation.