librade V2 feature requests

drowe67 commented 3 weeks ago

A place to jot down notes for features we are considering adding to V2 of the RADE API. We'll triage these before the coding starts - they may not all make it into V2.

text channel. Required for FreeDV reporter. There is currently 25 bit/s available for auxiliary data, this is being used for acquisition in V1. In particular it enables us to trap any false sync states (e.g. a bad freq offset). Allocating some of these bits for txt will affect the robustness of acquisition, which has been a problematic and time consuming area of development. It will need to be re-tested in simulation and perhaps OTA with a short test campaign to make sure robustness is not affected. We also need build API support, and add some sort of protocol e.g. a high rate (LDPC?) code to mop up errors, a CRC, and some framing (e.g. codec2 reliable txt system). However this is all DSP work, no ML re-training rqd.
SNR estimation. This is a tricky, and we haven't found a good solution to date. In legacy algorithms we measured the variance in the scatter diagram dots based on their expected positions. For RADAE the expected positions are time varying. We do have a "poor mans" DSP algorithm that works from the pilot symbols - it gives good results for AWGN but is inaccurate for multipath channels.
API Doxygen support. Add comments to code, Doxygen post processing.
C port of core ML - will speed up RADAE enc/dec, greatly reducing CPU (FARGAN decoder will then dominate), and getting us closer to a pure C implementation.
Cython for dsp.py - will significantly reduce CPU load, and (I think) gives us C code to move us closer to a pure C implementation.
Further ML development - we may be able to get several dB lower, reduce latency, and improve acquisition. R&D required.
Limit 99% power bandwidth to approximately 1500 Hz. Needs to be done carefully to avoid PAPR reduction and carefully tested to ensure no performance degradation. All ctests need to be run with BPF signal, and effort to tweak tests for reliable passes. Initial investigation in #30
doc/radae_intro_waveform suggestions: table comparing other waveforms, e.g. legacy FreeDV, break down enc/dec to include ML and DSP components, figure showing time versus freq, pilots, symbol breakdown to CP for OFDM waveform. References.
Clean up of radae repo, or creation on new "release" repo specifically for production quality code, support of target operating systems. The radae repo is Davids experimental playground (and we probably need such a repo moving fwd).

tmiw commented 1 week ago

I finally got RADE working on my MacBook Pro in freedv-gui. A few things that I noticed so far:

Apparently I need to use NumPy 1.x, not 2.x. This is different from Windows for some unknown reason. Is there a reason why I shouldn't try to make Windows use 1.x as well (or spend additional time making 2.x work on macOS)? Should this be detailed in the documentation if it's not already there?
freedv-gui is happy to use every single core available on my laptop. This makes the rest of the system less responsive. Is there some number of threads that will work reliably for most users that we can pass to torch.set_num_threads() during initialization? (On that note, because of this behavior I can't use RADE on my M1 Mac Mini since it makes SmartSDR unresponsive, thus no audio being passed in from the radio.)
It looks like I need to apply approximately -4 dB gain to the TX audio or else it's just 3 kHz of noise (at least according to the TX waterfall in SmartSDR). Not sure if it's a configuration issue on my end or a potential RADE bug.

Anyway, I'm going to fix the bugs Walter et al reported and see if they have additional feedback too.

drowe67 commented 1 week ago

I'd be inclined to stabilise Windows first them come back to this. I've nervous about a focus on Mac-specific issues when the # users is so small and we are so close to a working Windows version.

tmiw commented 1 week ago

I'd be inclined to stabilise Windows first them come back to this. I've nervous about a focus on Mac-specific issues when the # users is so small and we are so close to a working Windows version.

I could see it causing issues on Windows machines as well depending on the machine. We wouldn't need to make this user-configurable, either; maybe something like this when rade_initialize() gets called:

cur_num_cores = torch.get_num_threads()
new_num_cores = cur_num_cores * 3 // 4
if new_num_cores < 1:
    new_num_cores = 1
torch.set_num_threads(new_num_cores)

That all said, we could wait and see how the first few released builds work out before doing such a change.

drowe67 commented 1 week ago

Suspect this is more a symptom of an overloaded system. A modern OS shouldn't need manual control of cores to maintain smooth operation. Suggest we're better off profiling and then addressing total CPU load as we have planned with a C port of selected parts of the Python code as we discussed at PLT.

We'll also need to address end user expectations - not everyone will be able to run the first release. This should improve over the next few months while we progress optimisation.

tmiw commented 6 days ago

Apparently allowing OpenBLAS free reign over the number of threads it can use wreaks havoc if your application also uses threads: https://github.com/OpenMathLib/OpenBLAS/blob/develop/USAGE.md. With this in mind, I updated freedv-gui to force OPENBLAS_NUM_THREADS to 1 and that significantly reduced CPU usage, especially on TX. (This on both macOS and Windows per testing from Walter and others.) That change doesn't appear to have affected the ability to decode in real time, ether.

Anyway, no need to make changes on the library side. Just figured I'd put that info out there since it did seem kinda odd that TX was using so much CPU despite supposedly not per the RADE paper.

drowe67 / radae

librade V2 feature requests #28