[question] would a CW demodulator be possible? - Githubissues

f4exb / sdrangel

SDR Rx/Tx software for Airspy, Airspy HF+, BladeRF, HackRF, LimeSDR, PlutoSDR, RTL-SDR, SDRplay and FunCube

GNU General Public License v3.0

2.85k stars 433 forks source link

[question] would a CW demodulator be possible? #603

Closed febs closed 3 years ago

febs commented 4 years ago

I think I stumbled upon some morse code transmissions while listening with SDRangel (it was around 7MHz) and I though it would be super cool if the software could automatically translate that into text for me.

Thanks,

f4exb commented 4 years ago

I have thought about it but in order to do this properly and go beyond the traditional threshold approach that does not work very well unless you have a calibrated strong signal to process you have to resort to machine learning and this is not a small undertaking. I think I have come across some works that have been done in this area but I don't have the pointers right now (to be added later). One can also dig into speech recognition which could be used in a similar approach.

f4exb commented 4 years ago

The works I came across was Mauri Niininen's (AG1LE). I don't have the original pointers but there seems to be more recent ones:

hiperiondev commented 3 years ago

Maybe this can be useful

http://ag1le.blogspot.com/2013/09/new-morse-decoder-part-1.html

and more recently: http://ag1le.blogspot.com/2020/04/new-real-time-deep-learning-morse.html

f4exb commented 3 years ago

The proposed technique is derived from Handwritten Text Recognition (HTR). In the recent years Transformers have superseded LSTM.

https://blog.parashift.io/blog/handwritten-text-recognition (overview of HTR techniques including Transformers)
https://arxiv.org/abs/2005.13044 (included link)

In the latter article:

Moreover, the proposed transformer approach is designed to work at character level, instead at the commonly used word level in translation or speech recognition applications. By using such design we are not restricted to any predefined fixed vocabulary, so we are able to recognize outof-vocabulary (OOV) words, i.e. never seen during training.

This is very interesting for Morse code decoding.

f4exb commented 3 years ago

I have explored a bit more the subject and Mauri's AG1LE work. Actually I found his older work: https://github.com/ag1le/RNN-Morse quite interesting. Although less elaborate than the more recent developments it seems to take into account the "music" of the Morse code that is the "dit"` and "dah" lengths and the various forms of silence. Morse alphabet is not quite like handwriting so I am not sure the best approach is to reapply HCR methods directly on it. Any proficient human CW "decoder" will tell you that the "music of the code" is very important. So not taking this into account at the very first step might be detrimental to the rest of the chain. The "dit", "dah" and various silence separators input could then be taken as stimuli of the next network. It has probably more to do with automatic score transcription of music than handwritten character recognition. As a first step these stimuli lines could be used to make a powerful "intelligent" CW "filter" that reconstructs the modulation.

Edit: Transformers are a no-go: too few examples for time series and indeed they seem more adapted to Natural Language Processing (NLP) with a definite number of discrete input values.

f4exb commented 3 years ago

After 10 days of intensive work there is no progress. Hence this issue is closed

f4exb commented 3 years ago

Reopening as some progress was made. Soon I will publish some research Jupyter notebooks on a new repo. This could be the base of some "production grade" audio based utility then make the way into SDRangel as a feature plugin. But this is a long, long way...

My conclusions at this point are:

you need good DSP pre-processing. The approach of heterodyning at peak frequency and filter is not good because it creates temporal ripples in the envelope signal. This introduces an artificial noise on the envelope that surely we do not need. Instead FFT with overlap can be used that fit an "empirically optimal" (!) number of samples per dit (found 5.77 was good enough). In order to obtain the proper FFT configuration one should have to provide the estimated WPM keying rate. At least in a first approach it will not attempt to guess the WPM automatically. Also at this point we will take the envelope of the signal as 1D data possibly collating adjacent FFT bins. It has been found that using a comb of adjacent FFT bins does not provide significant added value just increasing DNN complexity and training times.
One should not attempt to adapt the data to an existing model. Morse keying and code is specific enough to deserve its own model. The approach of re-using existing HCR or NLP models is at best non optimal and would probably fail at decoding noisy signals. This is in general good ML practice: first know your data then try to build the model from it using the basic bricks. The Morse decoding from audio issue is basically a time series problem so indeed it should start from a RNN and a LSTM layer seems appropriate (Transformers are interesting but there is not a lot of literature concerning its application to time series). But then jumping right into character recognition is missing the very fabric of Morse code which is composed of dits, dahs and various kinds of silences. So to get one's feet wet I would start by designing a model that can recognize the essential keying features (dits, dahs and element, character and word separators). This was the idea in the RNN-Morse project cited above. This can already yield a useful model for de-noising in an encoder-decoder configuration quite common for audio or image de-noisers. This also give a sense of the number of essential features to retain. Only then we could move on to character recognition.

Note on typical machine learning acronyms:

DNN: Deep Neural Network
HCR: Handwritten Character Recognition
NLP: Natural Language Processing
ML: Machine Learning
RNN: Recurrent Neural Network
LSTM: Long Short Term Memory

f4exb commented 3 years ago

Some food for thought here: https://github.com/f4exb/morseangel This is only in Jupyter Notebooks so at research stage. The RNN-Morse-chars_dual notebook looks promising but I still have the lack of knowledge to properly extract the final data.

f4exb commented 3 years ago

Eventually something that seems to work fairly well: https://github.com/f4exb/morseangel/blob/main/notebooks/RNN-Morse-chars_single-ord36e96.ipynb Training needs to be done progressively and with a rather low learning rate from the start. This model is fairly hard to train because its sweet spot seems to be very narrow. The effective number of samples per dit is increased to 7.69 (decimation 96) which allows some more tolerance.

f4exb commented 3 years ago

MorseAngel is now an autonomous audio based application. It uses PyQt5 for the GUI and has the same theme as SDRangel. I think there is no point in integrating it with SDRangel. It is very easy to pipe in the audio coming out of SDRangel at best a mention can be made in the Wiki. At the moment this is not working satisfactorily but can show some intermittent successful decodes. Development will continue (contributors welcome!) in the MorseAngel project.

Edit: Wiki page here

mfalkvidd commented 1 year ago

New link (I guess the wiki page was renamed which gave it a new URL): https://github.com/f4exb/sdrangel/wiki/Decoding-Morse-code-from-audio