flame / blis

BLAS-like Library Instantiation Software Framework
Other
2.27k stars 365 forks source link

Upstream BLIS patches for ARM SVE? #807

Open jd-foster opened 4 months ago

jd-foster commented 4 months ago

We have a multi-platform, cross-compiling build system for binaries in the Julia ecosystem ( BinaryBuilder.jl ), with the Yggdrasil repo hosting the recipes.

While the latest recipe for the shiny BLIS v1.0 is here: https://github.com/JuliaPackaging/Yggdrasil/pull/8626 , the original recipe was kindly written by BLIS contributor @xrq-phys who wrote patches for optimisations on ARM SVE architectures. The patches essentially reduce to

-#if !defined(BLIS_FAMILY_A64FX)
+#if 1

in a number of locations (see https://github.com/JuliaPackaging/Yggdrasil/pull/8626/files for the full picture).

So our question is: do we need to keep carrying these patches, or can these be upstreamed to BLIS in any form?

Thanks for any clarification or suggestions here.

devinamatthews commented 4 months ago

IIUC these #if conditions only evaluate to 0 if compiling for A64fx. So, the net effect of the patch would be to essentially disable specific A64fx support? Maybe I'm missing something?

jd-foster commented 4 months ago

Thanks for the response. I think that's basically the idea.

devinamatthews commented 4 months ago

OK, I think I maybe see where this started and what the intent was. It looks like @xrq-phys's patch added a64fx to the arm64 family, and that he might have been concerned that this would disable the packing kernels for 256-bit ARM SVE (e.g. Graviton3) due to BLIS_FAMILY_A64FX being defined. However, the BLIS_FAMILY_XXX macros for specific architectures are only enabled when that is the only architecture enabled (e.g. configure a64fx). When configuring for generic arm64 architectures then the only family macro defined is BLIS_FAMILY_ARM64. So, the part of the patch dealing with BLIS_FAMILY_A64FX will have no effect.

However, the part about adding a64fx to the arm64 family could be upstreamed.

devinamatthews commented 4 months ago

This is, assuming run-time support for checking for A64fx exists which I'm not sure about. I think A64fx also has some other optimizations relying on specific Fujitsu compiler support which could make it difficult to co-exist with other arm64 configurations.

jd-foster commented 4 months ago

OK, looks like you understand the reasons (definitely more than I do.) Appreciate you looking into it more deeply.