Closed ilya-lavrenov closed 8 months ago
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
Hi @ilya-lavrenov
Please see the patch below adding support for multi_isa support in armv8a
https://review.mlplatform.org/c/ml/ComputeLibrary/+/9474
You should build with arch=armv8-a multi_isa=1
Hope this helps.
You should build with arch=armv8-a multi_isa=1
But in this case FP16 kernels will not be available. Right? We want FP32 (Rasb PI) + FP16 (Jenson TK / TX) + SVE (Graviton) at the same time.
But in this case FP16 kernels will not be available. Right?
Yes, that's correct.
We want FP32 (Rasb PI) + FP16 (Jenson TK / TX) + SVE (Graviton) at the same time.
As it stands, that's not currently supported.
that's not currently supported
Is it limitation of build options of library organization itself? Could you please provide more details? Because in case of build limitations we can write custom cmake scripts for it.
I believe it's beyond the scope of build-system changes.
I'd be interested to know a bit more about your use-case, how ACL's being used across these platforms, and the requirement for a 'portable' build with FP16 support? Is it more than comparative benchmarking?
The use case is inference engine with plugins for different devices:
Hi team, Could you please let us know what is your vision / decision about this request?
Hi @ilya-lavrenov
Thanks for sharing more details about this request.
This feature is not present in our roadmap but we will discuss your request to see if we can add support for this.
Hope this helps.
Hi @ilya-lavrenov
We made the changes required to support FP16 in all multi_isa builds, including armv8a. This feature is going to be present in the next release.
There are many patches required for this and they all were merged to the main development branch. You can try this building the latest main.
Hope this helps.
Hi @morgolock
Does it mean when I use multi_isa=True arch=armv8a
then all possible optimizations are turned on including SVE / SME, but actual kernels is selected in runtime based on host arch capabilities?
I see that in main
branch in your Gerrit development repo, we still have similar lines:
https://github.com/ARM-software/ComputeLibrary/blob/add70ace1e57f65d1ae4d0cedaec6e4578cf87ff/filedefs.json#L4-L10
And arm-8va is still without +fp16
. Am I right that common files are compiled without +fp16
support, while only source files with FP16 kernels are compiled with this armv8.2-a+fp16
.
And the difference with multi_isa=True arch=armv8.2-a
is that in case of armv8.2-a
all files are compiled with +fp16
option? Do we have performance difference between multi_isa=True arch=armv8.2-a
and multi_isa=True arch=armv8a
then? (assuming they are running on the same machine with FP16 support)
Hi @ilya-lavrenov
Does it mean when I use multi_isa=True arch=armv8a then all possible optimizations are turned on including SVE / SME, but actual kernels is selected in runtime based on host arch capabilities?
Yes, that's correct. You will have FP16,BF16, SVE/SVE2 but not SME. To enable SME in the multi_isa build you need to build with these options: multi_isa=1 extra_cxx_flags="-DENABLE_SME -DARM_COMPUTE_ENABLE_SME -DARM_COMPUTE_ENABLE_SME2"
And arm-8va is still without +fp16. Am I right that common files are compiled without +fp16 support, while only source files with FP16 kernels are compiled with this armv8.2-a+fp16. And the difference with multi_isa=True arch=armv8.2-a is that in case of armv8.2-a all files are compiled with +fp16 option?
That's correct.
Do we have performance difference between multi_isa=True arch=armv8.2-a and multi_isa=True arch=armv8a then? (assuming they are running on the same machine with FP16 support)
No, the two binaries will use the same FP16 kernels at runtime.
Hope this helps.
Hi Ilya,
What's the build command and toolchain you used?
I can build multi_isa+armv8a with gcc 11.3
See the command below:
PATH=../../toolchains/gcc-linaro-11.3.1-2022.06-x86_64_aarch64-linux-gnu/bin/:$PATH scons opencl=0 os=linux opencl=0 multi_isa=0 asserts=1 standalone=1 validation_tests=0 examples=0 neon=1 arch=armv8a benchmark_examples=1 validation_tests=0 extra_link_flags="-L../../toolchains/gcc-linaro-11.3.1-2022.06-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/lib/ -static" multi_isa=1 arch=armv8a debug=1 -j9
From: Ilya Lavrenov @.> Sent: 08 December 2023 12:23 To: ARM-software/ComputeLibrary @.> Cc: Subscribed @.***> Subject: Re: [ARM-software/ComputeLibrary] How to build universal ARM Compute? (Issue #1053)
@morgolockhttps://github.com/morgolock I've tried current main from your gerrit repo and faced with the compilation error:
2023-12-08T10:57:52.8334677Z /__w/openvino/openvino/openvino/src/plugins/intel_cpu/thirdparty/onednn/src/cpu/acl/acl_indirect_gemm_convolution.hpp:55:53: error: no matching function for call to 'arm_compute::Conv2dInfo::Conv2dInfo(const arm_compute::PadStrideInfo&, const arm_compute::Size2D&, const arm_compute::ActivationLayerInfo&, const bool&, int,
— Reply to this email directly, view it on GitHubhttps://github.com/ARM-software/ComputeLibrary/issues/1053#issuecomment-1847081991, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGNXISV4LV4KKA4BWYSYYV3YIMBEPAVCNFSM6AAAAAAYYAPHLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBXGA4DCOJZGE. You are receiving this because you are subscribed to this thread.Message ID: @.***>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Closing this as the feature was fully implemented and is present in 24.01.
If you require additional support please open a new issue.
Hi,
We want to build ARM Compute Library which is able to run optimally on Rasb PI (arm64-8a), NVIDIA Jetson (arm64-8.2a with fp16), Amazon Graviton (with SVE instructions). What we see in ARM Compute Library build scripts is
multi_isa
, but it implies only arm64-8.2a and higher. So, it seems to cover only last 2 processors. But how to build the library suitable for Rasb PI as well? What are recommendations here? Is it supposed to build binaries for rasb pi and others withmulti_isa
?