android / ndk

The Android Native Development Kit
1.97k stars 255 forks source link

[Bug]: fatal error: error in backend: Cannot select: 0x10ba8ace0: nxv4f32 = BUILD_VECTOR 0x133163a90, 0x1330f4800, 0x133108d20, 0x1248b9e60 #2037

Closed FeodorFitsner closed 2 weeks ago

FeodorFitsner commented 3 months ago

Description

I'm getting the following error when trying to compile Numpy 2.0.0 for Android on macOS with NDK r26d.

fatal error: error in backend: Cannot select: 0x10ba8ace0: nxv4f32 = BUILD_VECTOR 0x133163a90, 0x1330f4800, 0x133108d20, 0x1248b9e60
0x133163a90: f32,ch = strict_fadd 0x13483c4f0, 0x1230c34b0, 0x12314da30
0x1230c34b0: f32,ch = strict_fadd 0x13483c4f0, 0x12423b3b0, 0x133152940
0x12423b3b0: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<4>
0x12314d670: i64 = Constant<4>
0x133152940: f32 = extract_vector_elt 0x133182290, Constant:i64<0>
0x133182290: nxv4f32 = AArch64ISD::SINT_TO_FP_MERGE_PASSTHRU 0x1330ee2f0, 0x12314e2f0, undef:nxv4f32
0x1330ee2f0: nxv4i1 = AArch64ISD::PTRUE TargetConstant:i32<31>
0x10b1f53f0: i32 = TargetConstant<31>
0x12314e2f0: nxv4i32 = step_vector TargetConstant:i32<1>
0x124055af0: i32 = TargetConstant<1>
0x1248bbfb0: nxv4f32 = undef
0x12409edb0: i64 = Constant<0>
0x12314da30: f32,ch = strict_fadd 0x13483c4f0, 0x1248cdd10, 0x1248cdd10
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x1330f4800: f32,ch = strict_fadd 0x13483c4f0, 0x12409ea30, 0x12314da30
0x12409ea30: f32,ch = strict_fadd 0x13483c4f0, 0x12423b3b0, 0x1230b5df0
0x12423b3b0: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<4>
0x12314d670: i64 = Constant<4>
0x1230b5df0: f32 = extract_vector_elt 0x133182290, Constant:i64<1>
0x133182290: nxv4f32 = AArch64ISD::SINT_TO_FP_MERGE_PASSTHRU 0x1330ee2f0, 0x12314e2f0, undef:nxv4f32
0x1330ee2f0: nxv4i1 = AArch64ISD::PTRUE TargetConstant:i32<31>
0x10b1f53f0: i32 = TargetConstant<31>
0x12314e2f0: nxv4i32 = step_vector TargetConstant:i32<1>
0x124055af0: i32 = TargetConstant<1>
0x1248bbfb0: nxv4f32 = undef
0x124277300: i64 = Constant<1>
0x12314da30: f32,ch = strict_fadd 0x13483c4f0, 0x1248cdd10, 0x1248cdd10
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x133108d20: f32,ch = strict_fadd 0x13483c4f0, 0x124276ea0, 0x12314da30
0x124276ea0: f32,ch = strict_fadd 0x13483c4f0, 0x12423b3b0, 0x12423bb90
0x12423b3b0: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<4>
0x12314d670: i64 = Constant<4>
0x12423bb90: f32 = extract_vector_elt 0x133182290, Constant:i64<2>
0x133182290: nxv4f32 = AArch64ISD::SINT_TO_FP_MERGE_PASSTHRU 0x1330ee2f0, 0x12314e2f0, undef:nxv4f32
0x1330ee2f0: nxv4i1 = AArch64ISD::PTRUE TargetConstant:i32<31>
0x10b1f53f0: i32 = TargetConstant<31>
0x12314e2f0: nxv4i32 = step_vector TargetConstant:i32<1>
0x124055af0: i32 = TargetConstant<1>
0x1248bbfb0: nxv4f32 = undef
0x1240a38e0: i64 = Constant<2>
0x12314da30: f32,ch = strict_fadd 0x13483c4f0, 0x1248cdd10, 0x1248cdd10
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x1248b9e60: f32,ch = strict_fadd 0x13483c4f0, 0x1240afb70, 0x12314da30
0x1240afb70: f32,ch = strict_fadd 0x13483c4f0, 0x12423b3b0, 0x13317d8b0
0x12423b3b0: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<4>
0x12314d670: i64 = Constant<4>
0x13317d8b0: f32 = extract_vector_elt 0x133182290, Constant:i64<3>
0x133182290: nxv4f32 = AArch64ISD::SINT_TO_FP_MERGE_PASSTHRU 0x1330ee2f0, 0x12314e2f0, undef:nxv4f32
0x1330ee2f0: nxv4i1 = AArch64ISD::PTRUE TargetConstant:i32<31>
0x10b1f53f0: i32 = TargetConstant<31>
0x12314e2f0: nxv4i32 = step_vector TargetConstant:i32<1>
0x124055af0: i32 = TargetConstant<1>
0x1248bbfb0: nxv4f32 = undef
0x133156710: i64 = Constant<3>
0x12314da30: f32,ch = strict_fadd 0x13483c4f0, 0x1248cdd10, 0x1248cdd10
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
0x1248cdd10: f32,ch = strict_uint_to_fp 0x13483c4f0, Constant:i64<1>
0x124277300: i64 = Constant<1>
In function: _ZN3hwy5N_SVE6detail9Sort8RowsILm1ENS1_12SharedTraitsINS1_10TraitsLaneINS1_14OrderAscendingIfEEEEEEfEEvT0_PT1_mSB_
PLEASE submit a bug report to https://github.com/android-ndk/ndk/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /Users/feodor/ndk/r26d/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ --target=aarch64-linux-android24 -Inumpy/_core/libhighway_qsort.dispatch.h_SVE.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/Users/feodor/projects/flet-dev/python-android/install/android/arm64-v8a/python-3.12.4/include/python3.12 -I. -I./Include -I/Users/feodor/projects/flet-dev/mobile-forge/build/cp312/numpy/2.0.0/.mesonpy-fbyn59xf/meson_cpu -fcolor-diagnostics -DNDEBUG -Wall -Winvalid-pch -std=c++17 -O3 -ftrapping-math -DNPY_HAVE_CLANG_FPSTRICT -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_NEON_VFPV4 -DNPY_HAVE_NEON_FP16 -DNPY_HAVE_NEON -DNPY_HAVE_ASIMD -DNPY_HAVE_ASIMDHP -DNPY_HAVE_SVE -march=armv8.2-a+sve+fp16 -DNPY_MTARGETS_CURRENT=SVE -MD -MQ numpy/_core/libhighway_qsort.dispatch.h_SVE.a.p/src_npysort_highway_qsort.dispatch.cpp.o -MF numpy/_core/libhighway_qsort.dispatch.h_SVE.a.p/src_npysort_highway_qsort.dispatch.cpp.o.d -o numpy/_core/libhighway_qsort.dispatch.h_SVE.a.p/src_npysort_highway_qsort.dispatch.cpp.o -c ../numpy/_core/src/npysort/highway_qsort.dispatch.cpp
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module '../numpy/_core/src/npysort/highway_qsort.dispatch.cpp'.
4.      Running pass 'AArch64 Instruction Selection' on function '@_ZN3hwy5N_SVE6detail9Sort8RowsILm1ENS1_12SharedTraitsINS1_10TraitsLaneINS1_14OrderAscendingIfEEEEEEfEEvT0_PT1_mSB_'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang++ 0x0000000105753c60 llvm::SmallVectorBase<unsigned long long>::set_size(unsigned long) + 315468
1  clang++ 0x0000000105752ebc llvm::SmallVectorBase<unsigned long long>::set_size(unsigned long) + 311976
2  clang++ 0x00000001056c01ec llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setCallback(std::__1::function<void (bool const&)>) + 65696
3  clang++ 0x00000001056c019c llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setCallback(std::__1::function<void (bool const&)>) + 65616
4  clang++ 0x0000000105750930 llvm::SmallVectorBase<unsigned long long>::set_size(unsigned long) + 302364
5  clang++ 0x00000001044140bc
6  clang++ 0x00000001056c6cf0 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setCallback(std::__1::function<void (bool const&)>) + 93092
7  clang++ 0x00000001061f4740 llvm::IntervalMap<long long, std::__1::monostate, 8u, llvm::IntervalMapHalfOpenInfo<long long>>::deleteNode(llvm::IntervalMapImpl::NodeRef, unsigned int) + 1104224
8  clang++ 0x00000001061f3b4c llvm::IntervalMap<long long, std::__1::monostate, 8u, llvm::IntervalMapHalfOpenInfo<long long>>::deleteNode(llvm::IntervalMapImpl::NodeRef, unsigned int) + 1101164
9  clang++ 0x00000001044a3ad0
10 clang++ 0x00000001061ed464 llvm::IntervalMap<long long, std::__1::monostate, 8u, llvm::IntervalMapHalfOpenInfo<long long>>::deleteNode(llvm::IntervalMapImpl::NodeRef, unsigned int) + 1074820
11 clang++ 0x00000001061ecbb4 llvm::IntervalMap<long long, std::__1::monostate, 8u, llvm::IntervalMapHalfOpenInfo<long long>>::deleteNode(llvm::IntervalMapImpl::NodeRef, unsigned int) + 1072596
12 clang++ 0x00000001061ebc08 llvm::IntervalMap<long long, std::__1::monostate, 8u, llvm::IntervalMapHalfOpenInfo<long long>>::deleteNode(llvm::IntervalMapImpl::NodeRef, unsigned int) + 1068584
13 clang++ 0x00000001061ea854 llvm::IntervalMap<long long, std::__1::monostate, 8u, llvm::IntervalMapHalfOpenInfo<long long>>::deleteNode(llvm::IntervalMapImpl::NodeRef, unsigned int) + 1063540
14 clang++ 0x0000000104dc86e8 llvm::Pass* llvm::callDefaultCtor<llvm::MachineDominatorTree, true>() + 105760
15 clang++ 0x000000010511f3c0 llvm::Attribute llvm::CallBase::getFnAttrOnCalledFunction<llvm::StringRef>(llvm::StringRef) const + 121800
16 clang++ 0x0000000105124904 llvm::Attribute llvm::CallBase::getFnAttrOnCalledFunction<llvm::StringRef>(llvm::StringRef) const + 143628
17 clang++ 0x000000010511fe70 llvm::Attribute llvm::CallBase::getFnAttrOnCalledFunction<llvm::StringRef>(llvm::StringRef) const + 124536
18 clang++ 0x0000000105a006a4 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 836704
19 clang++ 0x00000001059fee1c void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 830424
20 clang++ 0x0000000105c43df4 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 3210160
21 clang++ 0x0000000106a8f3bc clang::extractapi::FunctionSignature clang::extractapi::DeclarationFragmentsBuilder::getFunctionSignature<clang::ObjCMethodDecl>(clang::ObjCMethodDecl const*) + 5662560
22 clang++ 0x0000000105e5f39c llvm::Registry<clang::PluginASTAction>::begin() + 11788
23 clang++ 0x0000000105e07578 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 5059380
24 clang++ 0x0000000105eaa4e8 llvm::Registry<clang::PluginASTAction>::begin() + 319320
25 clang++ 0x0000000104413ab4
26 clang++ 0x00000001044101a0
27 clang++ 0x0000000105d2e438 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 4170228
28 clang++ 0x00000001056c0180 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setCallback(std::__1::function<void (bool const&)>) + 65588
29 clang++ 0x0000000105d2d980 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 4167484
30 clang++ 0x0000000105d0be18 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 4029396
31 clang++ 0x0000000105d0c0c4 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 4030080
32 clang++ 0x0000000105d1aa44 void llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::VPBlockBase, false>>(llvm::DominatorTreeBase<llvm::VPBlockBase, false>&) + 4089856
33 clang++ 0x000000010440f03c
34 dyld    0x000000019ac8a0e0 start + 2360
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362)
Target: aarch64-unknown-linux-android24
Thread model: posix
InstalledDir: /Users/feodor/ndk/r26d/toolchains/llvm/prebuilt/darwin-x86_64/bin

highway_qsort-e38186.zip

Upstream bug

No response

Commit to cherry-pick

No response

Affected versions

r26

Canary version

No response

Host OS

Mac

Host OS version

macOS 14.5 (23F79)

Affected ABIs

arm64-v8a

DanAlbert commented 2 months ago

Not a regression, and quite late in r27 (I'm supposed to be sending it to QA nowish), so for now I'm triaging to r28. If there's a safe fix available to cherry-pick, we'll consider it for r27b.

pirama-arumuga-nainar commented 1 month ago

The reproducer fails at an earlier stage so I couldn't investigate further:

/Users/feodor/ndk/r26d/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/17/include/arm_sve.h:38:9: error: unknown type name '__SVBFloat16_t'
   38 | typedef __SVBFloat16_t svbfloat16_t;
      |         ^
/Users/feodor/ndk/r26d/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/17/include/arm_sve.h:304:44: error: cannot initialize a parameter of type '__SVBfloat16_t' with an lvalue of type 'svbfloat16_t' (aka 'int')
  304 |   return __builtin_sve_reinterpret_s8_bf16(op);

Can you instead add --save-temps -v to the failing command and attach the entire output and highway_qsort.* files?

pirama-arumuga-nainar commented 1 month ago

Also something is odd here:

fatal error: error in backend: Cannot select: 0x10ba8ace0: nxv4f32 = BUILD_VECTOR 0x133163a90, 0x1330f4800, 0x133108d20, 0x1248b9e60

nxv4f32 is a scalable vector register while BUILD_VECTOR returns a fixed-width vector (https://llvm.org/doxygen/namespacellvm_1_1ISD.html#a22ea9cec080dd5f4f47ba234c2f59110aff6f73b624fecca7dbe94259f9437e32). My educated guess is that there's a bug in the numpy code's handling of sve intrinsics.

pirama-arumuga-nainar commented 1 month ago

This does reproduce in r26. Using cvise to reduce gives the following reproducer:

$ cat highway_qsort_reduced.cpp
typedef __SVFloat32_t svfloat32_t;
__attribute__((__clang_arm_builtin_alias(__builtin_sve_svptrue_b32)))
int svptrue_b32();

__attribute__((__clang_arm_builtin_alias(__builtin_sve_svsub_f32_x)))
svfloat32_t svsub_f32_x(int, svfloat32_t, svfloat32_t);

template <int, int> struct Simd {
  using T = int;
};

template <typename, int> using CappedTag = Simd<65536, 0>;
template <class D> using TFromD = D::T;
template <int N, int kPow2> svfloat32_t Set(Simd<N, kPow2>, float);
template <class D> using VFromD = decltype(Set(D(), TFromD<D>()));
VFromD<Simd<6, 0>> Zero(Simd<65536, 0>);
void Add(svfloat32_t);
svfloat32_t Sub(svfloat32_t a, svfloat32_t b) {
  return svsub_f32_x(svptrue_b32(), a, b);
}
template <class D> using Vec = decltype(Zero(D()));
void Sort2To2(int, float *, int, float *);
template <int> void Sort16Rows(int, float *, int, float *) {
  constexpr int kLanesPerRow = 0;
  CappedTag<float, kLanesPerRow> d;
  Vec<decltype(d)> k1 = Set(d, kLanesPerRow);
  svfloat32_t __trans_tmp_2 = Sub(k1, k1);
  Add(__trans_tmp_2);
}
decltype(&Sort2To2) BaseCase_funcs = 6 ? Sort16Rows<6> : nullptr;

$ /ndks/android-ndk-r26d/toolchains/llvm/prebuilt/linux-x86_64/bin/clang "-cc1" "-triple" "aarc
h64-unknown-linux-android24" "-emit-obj"  "-ffp-exception-behavior=strict" "-target-cpu" "generic" "-target-feature" "+neon" "-target-feature" "+v8.2a" "-
target-feature" "+sve" "-target-feature" "+fullfp16" "-target-abi" "aapcs" "-mllvm" "-treat-scalable-fixed-error-as-warning"  "-O3"  "-x" "c++" "highway_q
sort.cpp"                                                         
...
fatal error: error in backend: Cannot select: 0x5618380ff880: nxv4f32 = BUILD_VECTOR 0x5618380ff0a0, 0x5618380fee70, 0x5618380fee00, 0x5618380ff340

Building with ToT clang and r27 shows an error of mismatched function signatures that is fixed by the following diff:

1a2
> typedef __SVBool_t svbool_t;
3c4
< int svptrue_b32();
---
> svbool_t svptrue_b32();
6c7
< svfloat32_t svsub_f32_x(int, svfloat32_t, svfloat32_t);
---
> svfloat32_t svsub_f32_x(svbool_t, svfloat32_t, svfloat32_t);

Even with the above fix, r26d still crashes. I am running a bisection to find the fix.

pirama-arumuga-nainar commented 1 month ago

https://github.com/llvm/llvm-project/commit/1597e5e6932b944c2c382a138e76b757da56b200 is the fix.

This doesn't affect r27 and newer NDKs.

DanAlbert commented 1 month ago

To be open about the triage here: the support window for r26 is still open for a bit longer, but we're quite late in its lifecycle, so we'll only be fixing regressions. This bug isn't a regression (I don't think SVE was in r25 at all? if it was it was experimental at the time). If there's another bug filed that does cause us to do another release, we'll include the fix for this. Leaving it open so we don't forget to do that if that happens.

DanAlbert commented 2 weeks ago

One week left in the support window and it takes longer than that to get an updated Clang released, so closing.