drowe67 / LPCNet

Experimental Neural Net speech coding for FreeDV
BSD 3-Clause "New" or "Revised" License
68 stars 25 forks source link

[Proposal] Build multiple lpcnet libraries with and without SIMD #27

Open hobbes1069 opened 4 years ago

hobbes1069 commented 4 years ago

Ok, while trying to figure out crazy and mostly unworkable ways to make this work for both developers and distros I came up with this idea:

Build the lpcnetfreedv library multiple times with and without optimizations. Because of its size, it would be good to separate the nndata into a separate library unless you think that will hurt performance. Otherwise that's tens of MB added to each library.

Then FreeDV could check for multiple libraries and load the "best" candidate.

I'm also experimenting with a different method of downloading the nnet data by creating a custom target so you can "make download" or it will download on the fly during make instead of during configuring.

drowe67 commented 4 years ago

Couple of other ideas:

  1. All that changes between different optimizations is a couple of functions. They could be refactored into small libraries.
  2. It's possible to load the nnet data at run time, so it could be stored as a .f32 data file and distributed once.
hobbes1069 commented 4 years ago

Ok, so basically create multiple versions of the handful of functions and use runtime detection in FreeDV to know which to use?

drowe67 commented 4 years ago

Couple of ways:

  1. You could build all of the sets of SIMD functions into one executable, then select which set at run time.
  2. Build little libraries, each with a set of SIMD functions, and install just the one you want.

However as per my previous mailing list posts - just because we can doesn't mean we should...

I still have strong misgivings about pushing this out into the wild. It will be a support nightmare, as we won't know which machines will run and which won't. Think I'd feel better if we had some automation to determine if it will run in real time.

drowe67 commented 4 years ago

Also - the vast majority of users run Windows, so we need a cross platform way to handle this.

hobbes1069 commented 4 years ago

I don't disagree but with this being developed in the open, there is already people using it even if it's not ready for "prime time" yet. We do know there is interest so I think trying to solve these technical issues sooner rather than later are still in our best interest.

drowe67 commented 4 years ago

Sure, happy to keep brainstorming and I'd like to see the technical issues solved too - in a way that minimises support/and helps the end users.

Some suggested tasks:

  1. Modify LPCNet source code so AVX/SSE/None can be selected at run time rather than build time. This approach would mean building just one library and be cross platform.
  2. Create "accelerator detect" and "speed test" functions that can be used by a higher layer (like freedv-gui) to determine what accelerator technology is available and if the CPU is fast enough to run in real time.
  3. Modify freedv-gui to query the functions above for accelerator/machine performance before enabled 2020.
  4. Test on Windows/Linux/OSX/BSD etc
drowe67 commented 4 years ago

Re:

We do know there is interest

I've seen recent interest from package maintainers, but have you had any interest specifically from end users? I was wondering if there are records of the number of times freedv-gui packages are installed for example?

I've had several Windows users complain they can't use 2020 because of AVX.

hobbes1069 commented 4 years ago

Well specifically we know quisk is interested in supporting the mode, which is the current driver for packaging it. As far as windows, yes that's unfortuante but it seems AVX is the best basline we've found. AVX2 provided very limited benefit.

kkofler commented 4 years ago

Please also keep in mind that one SSE version is not enough: just like AVX (AVX1) and AVX2 are two separate things, SSE (SSE1) intrinsics are not the same thing as SSE2 intrinsics nor SSE3 intrinsics nor SSE4.1 intrinsics. See my comments in #25: https://github.com/drowe67/LPCNet/pull/25#issuecomment-620915421

drowe67 commented 4 years ago

@kkofler thanks for your comments, they will be useful the next time some one works on this code. Several other tasks we need to resource too, as detailed above,

drowe67 commented 3 years ago

@hobbes1069 I wonder if it's time to revisit this again? SSE support would open up FreeDV 2020 to many more people if it can be managed.

@tmiw I would be interested in your thoughts.

Key issue for me is to avoid end user problems i.e. "it doesn't work" bug reports because they are using a machine that doesn't have the CPU/SIMD power.

I am open to ideas on how we handle that :slightly_smiling_face:

tmiw commented 3 years ago

@hobbes1069, @drowe67, I'm thinking single library would be best from a distribution perspective.

That said, how far back are we expecting to support hardware-wise? I know for the macOS version of FreeDV, for instance, we only support 64-bit Intel and ARM (and even then, we only go as far back as macOS 10.11, which AFAIK only supports Apple machines with AVX/SSE).

drowe67 commented 3 years ago

That said, how far back are we expecting to support hardware-wise? I know for the macOS version of FreeDV, for instance, we only support 64-bit Intel and ARM (and even then, we only go as far back as macOS 10.11, which AFAIK only supports Apple machines with AVX/SSE).

That's a good question. I'd suggest not very far back, or based on what we can handle with AVX/SSE (current flavor) and None (no acceleration) based on a simple speed test and a reasonable amount of development. During 700D development I discovered quite a few Hams with very old (XP era) hardware, we can probably rule that out.

hobbes1069 commented 3 years ago

I know on Fedora x86_64 assumes SSE1 is available and is no longer distributing a 32bit install (but 32bit binaries and libraries are still available if needed). Of course this is all stuff we can or can't assume at build time. There are ways to dynamically test for and use the various instruction sets at program launch but the implementation is certainly beyond me.

kkofler commented 3 years ago

Fedora actually assumes SSE2 on x86 these days (even for the 32-bit multilibs). (That also implies that the older extensions, i.e., MMX and SSE1, can be assumed as well.) But SSE3 and higher (including any level of AVX) still have to be detected at runtime (or disabled entirely) in Fedora binaries.

tmiw commented 3 years ago

I know on Fedora x86_64 assumes SSE1 is available and is no longer distributing a 32bit install (but 32bit binaries and libraries are still available if needed). Of course this is all stuff we can or can't assume at build time. There are ways to dynamically test for and use the various instruction sets at program launch but the implementation is certainly beyond me.

For reference, here's how it's currently done in freedv-gui. Granted, that's only for AVX and not SSE, but there's also this page from Microsoft that shows how to get the others.

hobbes1069 commented 3 years ago

@kkofler so my memory failed me. I wish this was better documented somewhere.