Open drowe67 opened 3 weeks ago
I finally got RADE working on my MacBook Pro in freedv-gui. A few things that I noticed so far:
torch.set_num_threads()
during initialization? (On that note, because of this behavior I can't use RADE on my M1 Mac Mini since it makes SmartSDR unresponsive, thus no audio being passed in from the radio.)Anyway, I'm going to fix the bugs Walter et al reported and see if they have additional feedback too.
I'd be inclined to stabilise Windows first them come back to this. I've nervous about a focus on Mac-specific issues when the # users is so small and we are so close to a working Windows version.
I'd be inclined to stabilise Windows first them come back to this. I've nervous about a focus on Mac-specific issues when the # users is so small and we are so close to a working Windows version.
I could see it causing issues on Windows machines as well depending on the machine. We wouldn't need to make this user-configurable, either; maybe something like this when rade_initialize()
gets called:
cur_num_cores = torch.get_num_threads()
new_num_cores = cur_num_cores * 3 // 4
if new_num_cores < 1:
new_num_cores = 1
torch.set_num_threads(new_num_cores)
That all said, we could wait and see how the first few released builds work out before doing such a change.
Suspect this is more a symptom of an overloaded system. A modern OS shouldn't need manual control of cores to maintain smooth operation. Suggest we're better off profiling and then addressing total CPU load as we have planned with a C port of selected parts of the Python code as we discussed at PLT.
We'll also need to address end user expectations - not everyone will be able to run the first release. This should improve over the next few months while we progress optimisation.
Apparently allowing OpenBLAS free reign over the number of threads it can use wreaks havoc if your application also uses threads: https://github.com/OpenMathLib/OpenBLAS/blob/develop/USAGE.md. With this in mind, I updated freedv-gui to force OPENBLAS_NUM_THREADS
to 1 and that significantly reduced CPU usage, especially on TX. (This on both macOS and Windows per testing from Walter and others.) That change doesn't appear to have affected the ability to decode in real time, ether.
Anyway, no need to make changes on the library side. Just figured I'd put that info out there since it did seem kinda odd that TX was using so much CPU despite supposedly not per the RADE paper.
A place to jot down notes for features we are considering adding to V2 of the RADE API. We'll triage these before the coding starts - they may not all make it into V2.
text channel. Required for FreeDV reporter. There is currently 25 bit/s available for auxiliary data, this is being used for acquisition in V1. In particular it enables us to trap any false sync states (e.g. a bad freq offset). Allocating some of these bits for txt will affect the robustness of acquisition, which has been a problematic and time consuming area of development. It will need to be re-tested in simulation and perhaps OTA with a short test campaign to make sure robustness is not affected. We also need build API support, and add some sort of protocol e.g. a high rate (LDPC?) code to mop up errors, a CRC, and some framing (e.g. codec2 reliable txt system). However this is all DSP work, no ML re-training rqd.
SNR estimation. This is a tricky, and we haven't found a good solution to date. In legacy algorithms we measured the variance in the scatter diagram dots based on their expected positions. For RADAE the expected positions are time varying. We do have a "poor mans" DSP algorithm that works from the pilot symbols - it gives good results for AWGN but is inaccurate for multipath channels.
API Doxygen support. Add comments to code, Doxygen post processing.
C port of core ML - will speed up RADAE enc/dec, greatly reducing CPU (FARGAN decoder will then dominate), and getting us closer to a pure C implementation.
Cython for
dsp.py
- will significantly reduce CPU load, and (I think) gives us C code to move us closer to a pure C implementation.Further ML development - we may be able to get several dB lower, reduce latency, and improve acquisition. R&D required.
Limit 99% power bandwidth to approximately 1500 Hz. Needs to be done carefully to avoid PAPR reduction and carefully tested to ensure no performance degradation. All ctests need to be run with BPF signal, and effort to tweak tests for reliable passes. Initial investigation in #30
doc/radae_intro_waveform
suggestions: table comparing other waveforms, e.g. legacy FreeDV, break down enc/dec to include ML and DSP components, figure showing time versus freq, pilots, symbol breakdown to CP for OFDM waveform. References.Clean up of
radae
repo, or creation on new "release" repo specifically for production quality code, support of target operating systems. Theradae
repo is Davids experimental playground (and we probably need such a repo moving fwd).