Closed FeodorFitsner closed 2 weeks ago
Not a regression, and quite late in r27 (I'm supposed to be sending it to QA nowish), so for now I'm triaging to r28. If there's a safe fix available to cherry-pick, we'll consider it for r27b.
The reproducer fails at an earlier stage so I couldn't investigate further:
/Users/feodor/ndk/r26d/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/17/include/arm_sve.h:38:9: error: unknown type name '__SVBFloat16_t'
38 | typedef __SVBFloat16_t svbfloat16_t;
| ^
/Users/feodor/ndk/r26d/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/17/include/arm_sve.h:304:44: error: cannot initialize a parameter of type '__SVBfloat16_t' with an lvalue of type 'svbfloat16_t' (aka 'int')
304 | return __builtin_sve_reinterpret_s8_bf16(op);
Can you instead add --save-temps -v
to the failing command and attach the entire output and highway_qsort.*
files?
Also something is odd here:
fatal error: error in backend: Cannot select: 0x10ba8ace0: nxv4f32 = BUILD_VECTOR 0x133163a90, 0x1330f4800, 0x133108d20, 0x1248b9e60
nxv4f32
is a scalable vector register while BUILD_VECTOR
returns a fixed-width vector (https://llvm.org/doxygen/namespacellvm_1_1ISD.html#a22ea9cec080dd5f4f47ba234c2f59110aff6f73b624fecca7dbe94259f9437e32). My educated guess is that there's a bug in the numpy code's handling of sve intrinsics.
This does reproduce in r26. Using cvise to reduce gives the following reproducer:
$ cat highway_qsort_reduced.cpp
typedef __SVFloat32_t svfloat32_t;
__attribute__((__clang_arm_builtin_alias(__builtin_sve_svptrue_b32)))
int svptrue_b32();
__attribute__((__clang_arm_builtin_alias(__builtin_sve_svsub_f32_x)))
svfloat32_t svsub_f32_x(int, svfloat32_t, svfloat32_t);
template <int, int> struct Simd {
using T = int;
};
template <typename, int> using CappedTag = Simd<65536, 0>;
template <class D> using TFromD = D::T;
template <int N, int kPow2> svfloat32_t Set(Simd<N, kPow2>, float);
template <class D> using VFromD = decltype(Set(D(), TFromD<D>()));
VFromD<Simd<6, 0>> Zero(Simd<65536, 0>);
void Add(svfloat32_t);
svfloat32_t Sub(svfloat32_t a, svfloat32_t b) {
return svsub_f32_x(svptrue_b32(), a, b);
}
template <class D> using Vec = decltype(Zero(D()));
void Sort2To2(int, float *, int, float *);
template <int> void Sort16Rows(int, float *, int, float *) {
constexpr int kLanesPerRow = 0;
CappedTag<float, kLanesPerRow> d;
Vec<decltype(d)> k1 = Set(d, kLanesPerRow);
svfloat32_t __trans_tmp_2 = Sub(k1, k1);
Add(__trans_tmp_2);
}
decltype(&Sort2To2) BaseCase_funcs = 6 ? Sort16Rows<6> : nullptr;
$ /ndks/android-ndk-r26d/toolchains/llvm/prebuilt/linux-x86_64/bin/clang "-cc1" "-triple" "aarc
h64-unknown-linux-android24" "-emit-obj" "-ffp-exception-behavior=strict" "-target-cpu" "generic" "-target-feature" "+neon" "-target-feature" "+v8.2a" "-
target-feature" "+sve" "-target-feature" "+fullfp16" "-target-abi" "aapcs" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-O3" "-x" "c++" "highway_q
sort.cpp"
...
fatal error: error in backend: Cannot select: 0x5618380ff880: nxv4f32 = BUILD_VECTOR 0x5618380ff0a0, 0x5618380fee70, 0x5618380fee00, 0x5618380ff340
Building with ToT clang and r27 shows an error of mismatched function signatures that is fixed by the following diff:
1a2
> typedef __SVBool_t svbool_t;
3c4
< int svptrue_b32();
---
> svbool_t svptrue_b32();
6c7
< svfloat32_t svsub_f32_x(int, svfloat32_t, svfloat32_t);
---
> svfloat32_t svsub_f32_x(svbool_t, svfloat32_t, svfloat32_t);
Even with the above fix, r26d still crashes. I am running a bisection to find the fix.
https://github.com/llvm/llvm-project/commit/1597e5e6932b944c2c382a138e76b757da56b200 is the fix.
This doesn't affect r27 and newer NDKs.
To be open about the triage here: the support window for r26 is still open for a bit longer, but we're quite late in its lifecycle, so we'll only be fixing regressions. This bug isn't a regression (I don't think SVE was in r25 at all? if it was it was experimental at the time). If there's another bug filed that does cause us to do another release, we'll include the fix for this. Leaving it open so we don't forget to do that if that happens.
One week left in the support window and it takes longer than that to get an updated Clang released, so closing.
Description
I'm getting the following error when trying to compile Numpy 2.0.0 for Android on macOS with NDK r26d.
highway_qsort-e38186.zip
Upstream bug
No response
Commit to cherry-pick
No response
Affected versions
r26
Canary version
No response
Host OS
Mac
Host OS version
macOS 14.5 (23F79)
Affected ABIs
arm64-v8a