catid / wirehair

Wirehair : O(N) Fountain Code for Large Data
http://wirehairfec.com
BSD 3-Clause "New" or "Revised" License
264 stars 56 forks source link

ARM64 Windows Support #31

Open hanseuljun opened 3 years ago

hanseuljun commented 3 years ago

At gf256.h line 60-66, immintrin.h gets included if the build is based on MSVC and is AVX2 is supported. The issue here starts from immintrin.h not supporting build targets other than x86 or x64.

image

It is very understandable assuming Windows is for x86 and x64 machines, but unfortunately, there is already an ARM64 machine: Hololens 2... The fix I can think of using GF256_TARGET_MOBILE as a guard to not including immintrin.h, so including immintrin.h can be avoided by manually setting GF256_TARGET_MOBILE.

I will soon submit a pull request based on the above fix and let me know if you think there is a better solution.

catid commented 3 years ago

Nice fix thank you for contributing!

igorauad commented 3 years ago

Hi @hanseuljun , I'm currently looking at the intrinsics headers included at gf256.h. Just trying to understand, why is <immintrin.h> included if the build is based on MSVC? Shouldn't the condition be something like the one below:

#if defined(__AVX2__) && (!defined (_MSC_VER) || _MSC_VER >= 1900)

That is, if AVX2 is available with a compiler other than MSVC, include immintrin.h. Also, if the compiler is MSVC and the version is at least 1900, include immintrin.h.

If I understand correctly, the compiler would no set __AVX2__ for an ARM build target anyway, so we wouldn't need to check GF256_TARGET_MOBILE, as done in #32 . It seems that the || (defined (_MSC_VER) && _MSC_VER >= 1900) part on the condition was the real problem.

However, I'm only starting to look at all of this. So please let me know if I'm missing something.

hanseuljun commented 3 years ago

@igorauad What I found different in my build environment (MSVC), AVX2 was set even when the build target is ARM. I know this is weird given AVX2 is only for x86 CPUs but I'm just reporting what I have seen.

Let me know if this is not the typical behavior of MSVC or something different from your experience.

igorauad commented 3 years ago

@igorauad What I found different in my build environment (MSVC), AVX2 was set even when the build target is ARM. I know this is weird given AVX2 is only for x86 CPUs but I'm just reporting what I have seen.

Let me know if this is not the typical behavior of MSVC or something different from your experience.

Thanks, @hanseuljun That's interesting. I was only speculating. However, I've been testing with gcc only, not MSVC.

hanseuljun commented 3 years ago

@igorauad

Sorry I was taking time editing the above comment after actually looking at the code again. You are right! Sorry about spreading wrong information and causing confusion. Seems like the issue was assuming that all MSVC builds will target the x86 architecture in gf256.h, not MSVC incorrectly setting AVX2. Thanks for the correction!

igorauad commented 3 years ago

Nice, @hanseuljun Thanks for confirming the problem.

Btw, I'm experimenting with this modified version of gf256.h here: igorauad/gf256.h. What I'm trying to do is have LINUX_ARM and GF256_TARGET_MOBILE set automatically, instead of based on flags provided by the user during compilation. In the same direction, I changed the SSE3 stuff so that it is included automatically based on the __SSE3__ macro (just like how the AVX2 header is included if __AVX2__ is set). Lastly, it seems to me that SSE2 is strictly required when the build target is not a "mobile platform" (e.g., ARM), so I added a preprocessor verification for SSE2 in gf256.h. However, this is all subject to further investigation, as I need to understand the codebase better.

I'm assuming SSE2 is strictly required because many SSE2 functions are called in gf256.cpp under the condition !defined(GF256_TARGET_MOBILE). In contrast, the SSE3 and AVX2 functions are called specifically if the CpuHasSSSE3 and CpuHasAVX2 flags are set in runtime. So my understanding is that, regardless of how gf256 was compiled, if in runtime SSE3 and AVX2 are not found, gf256 will still work (although slower) as long as SSE2 is available.

@catid could you confirm this is the right direction? Also, how could I verify everything is working correctly? Anything specific to look at with the unit test application?

catid commented 3 years ago

Yeah auto-detect should be how it works for sure. Totally agreed. I think for this sort of thing, if it builds it should work because you can trust the intrinsics on each platform. There is a simple sanity check during the initialization so it should catch any obvious problems in this stuff.

yll139 commented 2 years ago

Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on?

catid commented 2 years ago

Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on?

Perhaps cmake -DCMAKE_BUILD_TYPE=Release .. will fix it?