drowe67 / LPCNet

Experimental Neural Net speech coding for FreeDV
BSD 3-Clause "New" or "Revised" License
68 stars 24 forks source link

Disable AVX2 by default even on supported machines #48

Closed tmiw closed 1 year ago

tmiw commented 1 year ago

This PR does the following to resolve #47:

  1. Updates the processor feature detection so that AVX2 is only used if specifically enabled (e.g. cmake -DAVX2=1), even if the system supports it.
  2. Disables processor feature optimization flags for all code files brought in from Codec2.
  3. Performs general cleanup of the processor feature detection overall for maintainability (e.g. defaults all except for AVX2 to be ON and only disables if not detected).

Justification: AVX2 does not appear to provide much if any performance benefit over standard AVX and in fact seems to cause significant problems generating binary builds for e.g. freedv-gui. See below for benchmarks:

AVX2 Enabled:

HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
direct split VQ
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 7280

real    0m1.245s
user    0m1.340s
sys 0m0.025s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
direct split VQ
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 7280

real    0m1.242s
user    0m1.336s
sys 0m0.025s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
direct split VQ
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 7280

real    0m1.231s
user    0m1.321s
sys 0m0.026s

AVX only, no AVX2:

HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 7280

real    0m1.333s
user    0m1.476s
sys 0m0.030s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
direct split VQ
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 7280

real    0m1.316s
user    0m1.463s
sys 0m0.030s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
direct split VQ
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 7280

real    0m1.347s
user    0m1.472s
sys 0m0.035s

Additionally, a further stack trace from Alan Beard after attempting to run a test build with only (2) simply causes LPCNet to crash later, indicating that AVX2 being improperly used is indeed the cause of the fault:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   liblpcnetfreedv.0.4.dylib       0x00000001021202bb lpcnet_enc + 107
1   libcodec2.1.0.dylib             0x0000000101fbdd54 freedv_comptx + 324
2   org.freedv.freedv               0x000000010057ee44 MainFrame::test2020Mode_() + 772
3   org.freedv.freedv               0x000000010058655c MainFrame::MainFrame(wxWindow*) + 7116
4   org.freedv.freedv               0x000000010057eafd MainFrame::MainFrame(wxWindow*) + 29
5   org.freedv.freedv               0x000000010057e907 MainApp::OnInit() + 375
6   org.freedv.freedv               0x0000000100980327 wxApp::CallOnInit() + 359
7   org.freedv.freedv               0x0000000100bae4f8 wxEntry(int&, wchar_t**) + 104
8   org.freedv.freedv               0x000000010057e183 main + 35
9   libdyld.dylib                   0x00007fff6ba457fd start + 1
tmiw commented 1 year ago

@drowe67, as for why this is needed now, I suspect something changed in a recent version of Clang and/or GCC with regard to code generation.

Anyway, I'm going to probably release another test build soon with this PR and confirm that it actually helps.

drowe67 commented 1 year ago

@drowe67, as for why this is needed now, I suspect something changed in a recent version of Clang and/or GCC with regard to code generation.

Yes, it is strange that this has suddenly become a problem after several years :thinking:

As an aside - IIRC AVX2 is only required on one source file with AVX2 macros. This could be due to the newer compilers trying to use AVX2 in other source files with general purpose code, perhaps due to recent optimisation features. So it might be an optimisation level thing.

tmiw commented 1 year ago

As an aside - IIRC AVX2 is only required on one source file with AVX2 macros. This could be due to the newer compilers trying to use AVX2 in other source files with general purpose code, perhaps due to recent optimisation features. So it might be an optimisation level thing.

Seems it's just nnet.c that uses AVX2 explicitly (via vec_avx.h), and only if AVX2 is enabled by the build system; there's AVX fallback code in there otherwise. Thus, I don't expect there to be any issues using 2020 on any systems that at least have AVX with this PR.

Anyway, so far I've tested on macOS x86_64 and Linux x86_64 and there aren't any obvious issues. I'll make sure ARM macOS still works and that Windows still builds and this should be ready for review/merge.

tmiw commented 1 year ago

Can confirm that Windows still builds and that FreeDV on ARM Mac still works with this PR. 👍

drowe67 commented 1 year ago

It is unfortunate that the root cause hasn't been found, it does feel like a backwards step and more of a work around than a solution. It doesn't make much sense that AVX2 would suddenly stop working :thinking:

Anyhoo, re merging this PR:

  1. Given that most of the end-users run Window, is just building on Windows sufficient testing? Should be beta tested by Windows users before merge?
  2. Have the changes resolved the issue reported by users on other OSes/machines that were reporting the issue?
tmiw commented 1 year ago

It is unfortunate that the root cause hasn't been found, it does feel like a backwards step and more of a work around than a solution. It doesn't make much sense that AVX2 would suddenly stop working 🤔

On my 2019 MBP, I seem to be using clang 14.0.0. LLVM has some recent PRs around the VBROADCAST instruction that might have gotten into that release (e.g. https://reviews.llvm.org/D109434). Thus, I still suspect something changing on the compiler side as I don't see anything in the CMake files for any of the projects since the original PR that would have caused this.

Also, the AVX2 code in LPCNet still works, just...not on a surprisingly large number of Intel machines. :(

  1. Given that most of the end-users run Window, is just building on Windows sufficient testing? Should be beta tested by Windows users before merge?

We actually weren't even compiling in AVX2 for Windows, which explains why we never saw this issue on that platform. I think it only came up on Linux/macOS recently because of the recent increased attention on the FreeDV project.

Anyway, ensuring that "avx2" doesn't appear in the Windows build logs may be sufficient but see below.

  1. Have the changes resolved the issue reported by users on other OSes/machines that were reporting the issue?

I'll release another test build tonight to confirm, but I suspect this PR will in fact resolve that issue. Alan compiled FreeDV himself on his Mac (possibly from master/ms-2020c), for instance, and got to the "first time use" dialog box.

drowe67 commented 1 year ago

Another theory: building with AVX2 produces a different build, that exposes some other issue in the LPCNet/codec2 code base. It's quite common to have subtle C errors exposed in this way, e.g. bug shows up on Windows but not Linux. Perhaps something in the init code over writing some other part of the program (if that's possible - not sure if program/data memory is separate on modern operating systems).

tmiw commented 1 year ago

Another theory: building with AVX2 produces a different build, that exposes some other issue in the LPCNet/codec2 code base. It's quite common to have subtle C errors exposed in this way, e.g. bug shows up on Windows but not Linux. Perhaps something in the init code over writing some other part of the program (if that's possible - not sure if program/data memory is separate on modern operating systems).

On modern systems, the text segment (where code resides) is read-only and would cause a fault as soon as something tried to write into it. Thus it seems unlikely that something would get written into it by buggy code that would cause the CPU to fault at some later time with "illegal instruction" when executed.

tmiw commented 1 year ago

So I spent a significant amount of time last night debugging a crash on startup issue on macOS 10.13 (which turned out to be an issue with VMWare and not an issue with the build /sigh), so I wasn't able to put out a new test build. Hoping for today but we'll see.

BTW the upcoming build also has changes to how BOOTSTRAP_WXWIDGETS works so that you don't have to run cmake twice anymore. Definitely a time savings there :) Unfortunately the build still doesn't seem to like me using any more than one core at a time but that's a lower priority for sure.

tmiw commented 1 year ago

I haven't heard of any issues with the code so we're probably good to merge.

(CMake, on the other hand, seems to not like me upgrading the Docker container to Fedora 37. For some reason the Codec2 build isn't finding the LPCNet build despite ./build_linux.sh outside the container working without issues.)