Closed tmiw closed 1 year ago
@drowe67, as for why this is needed now, I suspect something changed in a recent version of Clang and/or GCC with regard to code generation.
Anyway, I'm going to probably release another test build soon with this PR and confirm that it actually helps.
@drowe67, as for why this is needed now, I suspect something changed in a recent version of Clang and/or GCC with regard to code generation.
Yes, it is strange that this has suddenly become a problem after several years :thinking:
As an aside - IIRC AVX2 is only required on one source file with AVX2 macros. This could be due to the newer compilers trying to use AVX2 in other source files with general purpose code, perhaps due to recent optimisation features. So it might be an optimisation level thing.
As an aside - IIRC AVX2 is only required on one source file with AVX2 macros. This could be due to the newer compilers trying to use AVX2 in other source files with general purpose code, perhaps due to recent optimisation features. So it might be an optimisation level thing.
Seems it's just nnet.c that uses AVX2 explicitly (via vec_avx.h), and only if AVX2 is enabled by the build system; there's AVX fallback code in there otherwise. Thus, I don't expect there to be any issues using 2020 on any systems that at least have AVX with this PR.
Anyway, so far I've tested on macOS x86_64 and Linux x86_64 and there aren't any obvious issues. I'll make sure ARM macOS still works and that Windows still builds and this should be ready for review/merge.
Can confirm that Windows still builds and that FreeDV on ARM Mac still works with this PR. 👍
It is unfortunate that the root cause hasn't been found, it does feel like a backwards step and more of a work around than a solution. It doesn't make much sense that AVX2 would suddenly stop working :thinking:
Anyhoo, re merging this PR:
It is unfortunate that the root cause hasn't been found, it does feel like a backwards step and more of a work around than a solution. It doesn't make much sense that AVX2 would suddenly stop working 🤔
On my 2019 MBP, I seem to be using clang 14.0.0. LLVM has some recent PRs around the VBROADCAST instruction that might have gotten into that release (e.g. https://reviews.llvm.org/D109434). Thus, I still suspect something changing on the compiler side as I don't see anything in the CMake files for any of the projects since the original PR that would have caused this.
Also, the AVX2 code in LPCNet still works, just...not on a surprisingly large number of Intel machines. :(
- Given that most of the end-users run Window, is just building on Windows sufficient testing? Should be beta tested by Windows users before merge?
We actually weren't even compiling in AVX2 for Windows, which explains why we never saw this issue on that platform. I think it only came up on Linux/macOS recently because of the recent increased attention on the FreeDV project.
Anyway, ensuring that "avx2" doesn't appear in the Windows build logs may be sufficient but see below.
- Have the changes resolved the issue reported by users on other OSes/machines that were reporting the issue?
I'll release another test build tonight to confirm, but I suspect this PR will in fact resolve that issue. Alan compiled FreeDV himself on his Mac (possibly from master/ms-2020c), for instance, and got to the "first time use" dialog box.
Another theory: building with AVX2 produces a different build, that exposes some other issue in the LPCNet/codec2 code base. It's quite common to have subtle C errors exposed in this way, e.g. bug shows up on Windows but not Linux. Perhaps something in the init code over writing some other part of the program (if that's possible - not sure if program/data memory is separate on modern operating systems).
Another theory: building with AVX2 produces a different build, that exposes some other issue in the LPCNet/codec2 code base. It's quite common to have subtle C errors exposed in this way, e.g. bug shows up on Windows but not Linux. Perhaps something in the init code over writing some other part of the program (if that's possible - not sure if program/data memory is separate on modern operating systems).
On modern systems, the text segment (where code resides) is read-only and would cause a fault as soon as something tried to write into it. Thus it seems unlikely that something would get written into it by buggy code that would cause the CPU to fault at some later time with "illegal instruction" when executed.
So I spent a significant amount of time last night debugging a crash on startup issue on macOS 10.13 (which turned out to be an issue with VMWare and not an issue with the build /sigh), so I wasn't able to put out a new test build. Hoping for today but we'll see.
BTW the upcoming build also has changes to how BOOTSTRAP_WXWIDGETS
works so that you don't have to run cmake twice anymore. Definitely a time savings there :) Unfortunately the build still doesn't seem to like me using any more than one core at a time but that's a lower priority for sure.
I haven't heard of any issues with the code so we're probably good to merge.
(CMake, on the other hand, seems to not like me upgrading the Docker container to Fedora 37. For some reason the Codec2 build isn't finding the LPCNet build despite ./build_linux.sh
outside the container working without issues.)
This PR does the following to resolve #47:
cmake -DAVX2=1
), even if the system supports it.Justification: AVX2 does not appear to provide much if any performance benefit over standard AVX and in fact seems to cause significant problems generating binary builds for e.g. freedv-gui. See below for benchmarks:
AVX2 Enabled:
AVX only, no AVX2:
Additionally, a further stack trace from Alan Beard after attempting to run a test build with only (2) simply causes LPCNet to crash later, indicating that AVX2 being improperly used is indeed the cause of the fault: