gnuradio / volk

The Vector Optimized Library of Kernels
http://libvolk.org
GNU Lesser General Public License v3.0
547 stars 202 forks source link

Building Volk on Android with NEON Support. volk_32fc_convert_16ic_neon does not compile #589

Closed Aang23 closed 2 years ago

Aang23 commented 2 years ago

In the past, while attempting to build Volk with the Android NDK, NEON detection would not work properly, so I had to patch the CMake manually for it to work, and usually compromise with no NEON support.

Thankfully, this all appears to be fully working now (may also be due to newer NDKs not being as weird...). I was able to build Volk with full SIMD support for all 4 archs, so that's awesome news!

Though, volk_32fc_convert_16ic_neon _VOLK_ASM calls fail at compile-time, so I had to disable the NEON variant for things to work. This is the only kernel that does not build properly with the NDK.

Screenshot from 2022-08-01 14-16-15

Edit : This appears to be an Android-specific issue. Building on another armeabi-v7a machine running a standard GCC-based toolchain works as expected. Maybe just disable building this kernel if __ANDROID__ is set. Personally I'd prefer that over having to patch it.

jdemel commented 2 years ago

This is a special macro defined here: https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L166-L179

Furthermore, it is only used here:

https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L216-L217

Afterwards an https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L237 follows.

At the moment, we don't have Android CI. Thus, it'd be difficult to maintain a change.

It seems like this change got introduced in https://github.com/gnuradio/volk/commit/d5deb3a5ec2f3744ff48a9fbceb464ccf165f1ef . Before that, this macro didn't exist.

I'd like to narrow down the issue. Could you provide a bit more info?

Aang23 commented 2 years ago

If you want, I could PR an Android CI configuration. It's something I already have for other projects so it would not be too hard.

I can try and revert this change, see if that fixes it.

I'm targetting Android API 23, no specific hardware as this application is intended to runs "on anyone's phone". So I'm simply targetting armeabi-v7a, arm64-v8a, x86 and x86_64 as the NDK requires when building an APK utilizing the NDK.

The compiler being utilized is Clang, using the toolchain file provided in commandlinetools-linux-8512546. (Downloaded a few weeks ago)

Aang23 commented 2 years ago

Ok, using the implementation before https://github.com/gnuradio/volk/commit/d5deb3a5ec2f3744ff48a9fbceb464ccf165f1ef builds just fine.

jdemel commented 2 years ago

@Aang23 PRs are always welcome :smiley:

Just to make sure, only the armeabi-v7a fails? All others compile as expected, e.g. arm64-v8a?

I feel like I just found the ARM NEON intrinsics equivalent of the intel intrinsics guide.

The macro tries to mimic the vcvtaq_s32_f32 intrinsic.

The ARMv8 solution looks simple enough: https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L276-L277

More details about this change are discussed in #296 .

Aang23 commented 2 years ago

I will do it in the next few days. Won't run tests, but I will make it check it builds on all 4 archs required. (though since tests are being run on all of those in other actions, I wouldn't worry about it)

Only seems to happen on v7a.

Aang23 commented 2 years ago

So, I am considering making a PR to fix this very soon (possibly today). I have tried 2 ways to fix this :

What do you think @jdemel @michaelld ?

michaelld commented 2 years ago

@Aang23 that commit is for a single kernel: volk_32fc_convert_16ic. Is it a compile issue then, that the change from the commit break compiling? Or is it runtime, that the kernel doesn't pass test? I'm assuming then that other kernels build on Andriod v7a? It would awesome if someone could find a way to fix that kernel to work ...

Aang23 commented 2 years ago

@michaelld this kernel is the only one in all of Volk that does not compile with the Android toolchain. The changes in this commit cause the error initially mentionned in this issue. Yes... At least to me it appears it still does fine without those changes, but modiying it without extensive testing is a risk, which is why I personally just disabled it for now.

michaelld commented 2 years ago

@Aang23 guessing the issue is the macro VCVTRQ_S32_F32, since that's assembly code & thus potentially "volatile" across different NEON architectures. Some internet sleuthing turns up that the =t should really be =r (all 4) ... can you try that & see if that helps? ['t' == "VFP floating-point registers s0-s31. Used for 32 bit values." ; 'r' == "A register operand is allowed provided that it is in a general register." ... and is typically used for 32-bit signed integer values]

Aang23 commented 2 years ago

@michaelld Sure, but what =t are you referring to?

#define VCVTRQ_S32_F32(res, val)                \
    __VOLK_ASM("VCVTR.S32.F32 %[r0], %[v0]\n\t" \
               : [r0] "=w"(res[0])              \
               : [v0] "w"(val[0])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r1], %[v1]\n\t" \
               : [r1] "=w"(res[1])              \
               : [v1] "w"(val[1])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r2], %[v2]\n\t" \
               : [r2] "=w"(res[2])              \
               : [v2] "w"(val[2])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r3], %[v3]\n\t" : [r3] "=w"(res[3]) : [v3] "w"(val[3]) :);
michaelld commented 2 years ago

Just from the commit:

#define VCVTRQ_S32_F32(result, value)                   \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[0]) : "t"(value[0]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[1]) : "t"(value[1]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[2]) : "t"(value[2]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[3]) : "t"(value[3]) :);
michaelld commented 2 years ago

I believe should instead be:

#define VCVTRQ_S32_F32(result, value)                   \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[0]) : "t"(value[0]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[1]) : "t"(value[1]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[2]) : "t"(value[2]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[3]) : "t"(value[3]) :);
Aang23 commented 2 years ago

Ok, that's different. The code above (with no changes) compiles. https://github.com/gnuradio/volk/commit/0854b232ff473ca2f7fa99659c5ce9ee871454d4 which included changes to the Macro is the one breaking it.

michaelld commented 2 years ago

Ah ... gotcha. Well looking at the other macro code, the =w is wrong then. w == "Floating point register, Advanced SIMD vector register or SVE vector register". This is fine to the right-hand argument (w) -- though t should also work & might be the better option since it is 32-bit specific (worth trying IMHO). The left-hand argument is of type S32 (32-bit signed integer), which again should be =r. Can you try the =r first & see if that works? If so, can you also try t instead of w?

Aang23 commented 2 years ago

=r does not build, but =t does.

#define VCVTRQ_S32_F32(res, val)                \
    __VOLK_ASM("VCVTR.S32.F32 %[r0], %[v0]\n\t" \
               : [r0] "=t"(res[0])              \
               : [v0] "t"(val[0])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r1], %[v1]\n\t" \
               : [r1] "=t"(res[1])              \
               : [v1] "t"(val[1])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r2], %[v2]\n\t" \
               : [r2] "=t"(res[2])              \
               : [v2] "t"(val[2])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r3], %[v3]\n\t" : [r3] "=t"(res[3]) : [v3] "t"(val[3]) :);
michaelld commented 2 years ago

Interesting. Can you try =r and t respectively ... that's really the correct combination.

Aang23 commented 2 years ago

That does not compile. Interesting as it appears nearly identical to the previous, working Macro.

michaelld commented 2 years ago

Strange. If you go with =t and t respectively, it sounds like the build succeeds ... which is progress! does make test succeed? or volk_profile? I'm now curious if that code really works once compiled.

Aang23 commented 2 years ago

Took a while to test... Had to root an old Android device first (to test on real HW, and not a VM). Tests passed. volk_profile also worked. If anyone has a "better" machine running armv7a, it might be a good idea to confirm on something else.

Edit : Might have found something else to test on... Will report later.

michaelld commented 2 years ago

Well ... I'd say to create a PR with that fix & we'll get it in merged. No idea why it works, but the code makes a lot more sense with t than w given AARCH64 GCC machine constraint definitions.

Aang23 commented 2 years ago

Ok. I will do a bit more tested, then do a PR. Though perhaps, we could also go back to the initial code? That utilized the correct =r & t, and also passes tests here.

michaelld commented 2 years ago

Give them all a go & whatever works on Android armv7a I'll be also works for other aarch64 and NEON compilers

Aang23 commented 2 years ago

@michaelld see https://github.com/gnuradio/volk/pull/593

michaelld commented 2 years ago

@Aang23 merged! I think we're done with this issue now. If you agree, go ahead and close it out. Thx for your work!

Aang23 commented 2 years ago

@michaelld awesome. I can confirm - the current master branch builds and passes tests fine on Android (and other platforms) just fine now.

I will close this issue, but I'm of the opinion #594 should be worked on still. Having some sort of CI for every archs is pretty important... If the current system doesn't work, I do have some ideas - but I will carry on in the PR.

Thanks!