How to build universal ARM Compute?

ilya-lavrenov commented 1 year ago

Hi,

We want to build ARM Compute Library which is able to run optimally on Rasb PI (arm64-8a), NVIDIA Jetson (arm64-8.2a with fp16), Amazon Graviton (with SVE instructions). What we see in ARM Compute Library build scripts is multi_isa, but it implies only arm64-8.2a and higher. So, it seems to cover only last 2 processors. But how to build the library suitable for Rasb PI as well? What are recommendations here? Is it supposed to build binaries for rasb pi and others with multi_isa ?

zhangqicai commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

morgolock commented 1 year ago

Hi @ilya-lavrenov

Please see the patch below adding support for multi_isa support in armv8a

https://review.mlplatform.org/c/ml/ComputeLibrary/+/9474

You should build with arch=armv8-a multi_isa=1

Hope this helps.

ilya-lavrenov commented 1 year ago

You should build with arch=armv8-a multi_isa=1

But in this case FP16 kernels will not be available. Right? We want FP32 (Rasb PI) + FP16 (Jenson TK / TX) + SVE (Graviton) at the same time.

nSircombe commented 1 year ago

But in this case FP16 kernels will not be available. Right?

Yes, that's correct.

We want FP32 (Rasb PI) + FP16 (Jenson TK / TX) + SVE (Graviton) at the same time.

As it stands, that's not currently supported.

ilya-lavrenov commented 1 year ago

that's not currently supported

Is it limitation of build options of library organization itself? Could you please provide more details? Because in case of build limitations we can write custom cmake scripts for it.

nSircombe commented 1 year ago

I believe it's beyond the scope of build-system changes.

I'd be interested to know a bit more about your use-case, how ACL's being used across these platforms, and the requirement for a 'portable' build with FP16 support? Is it more than comparative benchmarking?

ilya-lavrenov commented 1 year ago

The use case is inference engine with plugins for different devices:

For x64 we have a plugin working from Intel Atom with SSE4 ending with modern processors supporting AMX instructions, including fp32, bf16 and fp16 kernels in a single binary. For this, in runtime we have checks whether current platform supports specific ISA. If yes, we call such kernel.
We want the same for ARM64: a single binary with can be run on any ARM64. Otherwise, we will have to build different set of binaries and different flavors of our inference engine. One for Rasb Pi, one for ARM64-v8.2a and higher, which is definitively not convenient. It will also confuse users who don't understand ARM architecture specifics and don't know difference between arm64-v8 and arm64-8.2a and they will be confused with multiple ARM packages (like I'm confused seeing multiple packages in your github releases)

ilya-lavrenov commented 1 year ago

Hi team, Could you please let us know what is your vision / decision about this request?

morgolock commented 1 year ago

Hi @ilya-lavrenov

Thanks for sharing more details about this request.

This feature is not present in our roadmap but we will discuss your request to see if we can add support for this.

Hope this helps.

morgolock commented 10 months ago

Hi @ilya-lavrenov

We made the changes required to support FP16 in all multi_isa builds, including armv8a. This feature is going to be present in the next release.

There are many patches required for this and they all were merged to the main development branch. You can try this building the latest main.

Hope this helps.

ilya-lavrenov commented 10 months ago

Hi @morgolock Does it mean when I use multi_isa=True arch=armv8a then all possible optimizations are turned on including SVE / SME, but actual kernels is selected in runtime based on host arch capabilities?

I see that in main branch in your Gerrit development repo, we still have similar lines: https://github.com/ARM-software/ComputeLibrary/blob/add70ace1e57f65d1ae4d0cedaec6e4578cf87ff/filedefs.json#L4-L10 And arm-8va is still without +fp16. Am I right that common files are compiled without +fp16 support, while only source files with FP16 kernels are compiled with this armv8.2-a+fp16. And the difference with multi_isa=True arch=armv8.2-a is that in case of armv8.2-a all files are compiled with +fp16 option? Do we have performance difference between multi_isa=True arch=armv8.2-a and multi_isa=True arch=armv8a then? (assuming they are running on the same machine with FP16 support)

morgolock commented 9 months ago

Hi @ilya-lavrenov

Does it mean when I use multi_isa=True arch=armv8a then all possible optimizations are turned on including SVE / SME, but actual kernels is selected in runtime based on host arch capabilities?

Yes, that's correct. You will have FP16,BF16, SVE/SVE2 but not SME. To enable SME in the multi_isa build you need to build with these options: multi_isa=1 extra_cxx_flags="-DENABLE_SME -DARM_COMPUTE_ENABLE_SME -DARM_COMPUTE_ENABLE_SME2"

And arm-8va is still without +fp16. Am I right that common files are compiled without +fp16 support, while only source files with FP16 kernels are compiled with this armv8.2-a+fp16. And the difference with multi_isa=True arch=armv8.2-a is that in case of armv8.2-a all files are compiled with +fp16 option?

That's correct.

Do we have performance difference between multi_isa=True arch=armv8.2-a and multi_isa=True arch=armv8a then? (assuming they are running on the same machine with FP16 support)

No, the two binaries will use the same FP16 kernels at runtime.

Hope this helps.

developer-compute commented 9 months ago

Hi Ilya,

What's the build command and toolchain you used?

I can build multi_isa+armv8a with gcc 11.3

See the command below:

PATH=../../toolchains/gcc-linaro-11.3.1-2022.06-x86_64_aarch64-linux-gnu/bin/:$PATH scons opencl=0 os=linux opencl=0 multi_isa=0 asserts=1 standalone=1 validation_tests=0 examples=0 neon=1 arch=armv8a benchmark_examples=1 validation_tests=0 extra_link_flags="-L../../toolchains/gcc-linaro-11.3.1-2022.06-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/lib/ -static" multi_isa=1 arch=armv8a debug=1 -j9

From: Ilya Lavrenov @.> Sent: 08 December 2023 12:23 To: ARM-software/ComputeLibrary @.> Cc: Subscribed @.***> Subject: Re: [ARM-software/ComputeLibrary] How to build universal ARM Compute? (Issue #1053)

@morgolockhttps://github.com/morgolock I've tried current main from your gerrit repo and faced with the compilation error:

2023-12-08T10:57:52.8334677Z /__w/openvino/openvino/openvino/src/plugins/intel_cpu/thirdparty/onednn/src/cpu/acl/acl_indirect_gemm_convolution.hpp:55:53: error: no matching function for call to 'arm_compute::Conv2dInfo::Conv2dInfo(const arm_compute::PadStrideInfo&, const arm_compute::Size2D&, const arm_compute::ActivationLayerInfo&, const bool&, int, , const arm_compute::WeightsInfo&)' 2023-12-08T10:57:52.8337754Z 55 | acp.weights_info));

— Reply to this email directly, view it on GitHubhttps://github.com/ARM-software/ComputeLibrary/issues/1053#issuecomment-1847081991, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGNXISV4LV4KKA4BWYSYYV3YIMBEPAVCNFSM6AAAAAAYYAPHLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBXGA4DCOJZGE. You are receiving this because you are subscribed to this thread.Message ID: @.***>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

morgolock commented 8 months ago

Closing this as the feature was fully implemented and is present in 24.01.

If you require additional support please open a new issue.

ARM-software / ComputeLibrary

How to build universal ARM Compute? #1053