Open jd-foster opened 4 months ago
IIUC these #if
conditions only evaluate to 0 if compiling for A64fx. So, the net effect of the patch would be to essentially disable specific A64fx support? Maybe I'm missing something?
Thanks for the response. I think that's basically the idea.
OK, I think I maybe see where this started and what the intent was. It looks like @xrq-phys's patch added a64fx
to the arm64
family, and that he might have been concerned that this would disable the packing kernels for 256-bit ARM SVE (e.g. Graviton3) due to BLIS_FAMILY_A64FX
being defined. However, the BLIS_FAMILY_XXX
macros for specific architectures are only enabled when that is the only architecture enabled (e.g. configure a64fx
). When configuring for generic arm64 architectures then the only family macro defined is BLIS_FAMILY_ARM64
. So, the part of the patch dealing with BLIS_FAMILY_A64FX
will have no effect.
However, the part about adding a64fx to the arm64 family could be upstreamed.
This is, assuming run-time support for checking for A64fx exists which I'm not sure about. I think A64fx also has some other optimizations relying on specific Fujitsu compiler support which could make it difficult to co-exist with other arm64 configurations.
OK, looks like you understand the reasons (definitely more than I do.) Appreciate you looking into it more deeply.
We have a multi-platform, cross-compiling build system for binaries in the Julia ecosystem ( BinaryBuilder.jl ), with the Yggdrasil repo hosting the recipes.
While the latest recipe for the shiny BLIS v1.0 is here: https://github.com/JuliaPackaging/Yggdrasil/pull/8626 , the original recipe was kindly written by BLIS contributor @xrq-phys who wrote patches for optimisations on ARM SVE architectures. The patches essentially reduce to
in a number of locations (see https://github.com/JuliaPackaging/Yggdrasil/pull/8626/files for the full picture).
So our question is: do we need to keep carrying these patches, or can these be upstreamed to BLIS in any form?
Thanks for any clarification or suggestions here.