drowe67 / LPCNet

Experimental Neural Net speech coding for FreeDV
BSD 3-Clause "New" or "Revised" License
68 stars 24 forks source link

Experimental index optimised direct VQ #42

Closed drowe67 closed 2 years ago

drowe67 commented 2 years ago

Part of experimental 2020A mode as discussed in #217, codec2 side of this work in https://github.com/drowe67/codec2/pull/274.

  1. Design tweaks to select between "direct VQ" (as currently used for FreeDV 2020) and "index optimised direct VQ".
  2. Use binary switch algorithm tool to optimise each of the 4 stage VQs, for example:
    ~/codec2/build_linux/misc/vq_binary_switch -d 18 -m 1000 split_stage1.f32 split_stage1_bs.f32
  3. Listen to a few samples using lpcnet_enc/lpcnet_dec and see if it improves speech quality in channels with random errors (lpcnet_dec can introduce errors).
  4. If effective integrate with modems and test resulting FreeDV mode in simulation and OTA.

Conclusions

April 2022.

This work was written up in a blog post. Conclusions:

  1. Index optimisation doesn't provide a significant improvement with this codec.
  2. On the samples tested, unequal error protection (in FreeDV 2020A and B) didn't perform significantly better than full error protection of all bits.
  3. We have decided not to keep 2020A, but keep 2020B as it allows operation on fast fading channels.

I've decided to merge this PR anyway, because:

  1. The index optimisation does make a slight improvement, so we might as well use it with the new 2020B mode. It won't be used with 2020 as it would break compatibility.
  2. The PR also fixed a few other small issues, which are worth keeping.
drowe67 commented 2 years ago

Initial test with just stage1 optimised didn't improve the quality. -x option selects index optimised VQ. The MSE on the stage1 VQ was reduced to 0.37 after 1000 iterations.

  1. Direct VQ, 1% BER: sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -s | ./src/lpcnet_dec -s -b 0.01 | aplay -f S16_LE -r 16000
  2. Index Optimised Direct VQ, 1% BER sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -x | ./src/lpcnet_dec -x -b 0.01 | aplay -f S16_LE -r 16000
drowe67 commented 2 years ago

I have used the codec2 binary switch tool to develop some index optimised direct quantisers. lpcnet_quant.c has been hacked to introduce bit errors, so we can test the effect of index optimisation.

Typical command line for test:

sox ../wav/all.wav -t raw - | ./src/dump_data --c2pitch --test - - | ./src/quant_feat -i --mbest 5 -p 0 -d 3 -q split_stage1_indopt.f32,split_stage2_indopt.f32,split_stage3_indopt.f32,split_stage4_indopt.f32 > /dev/null
VQ BER SD (dB)
Original direct split 0.0 2.2
Index optimised direct split 0.0 2.2
Original direct split 0.01 77.3
Index optimised direct split 0.01 31.0

When I listen on a short files (wia.wav and peter.wav) index opt sounds a little better. In general this codec is pretty noisy at 1% BER, suspect this codec is more sensitive to bit errors that the lower rate codecs.

When we integrate the index optimised tables, we can compare the results of direct:

sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -s | ./src/lpcnet_dec -s -b 0.02 | aplay -f S16_LE -r 16000

versus direct index optimised:

sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -x | ./src/lpcnet_dec -x -b 0.02 | aplay -f S16_LE -r 16000

The latter is a little better (less extreme clicks and pops), but still plenty of errors can be heard. It's not a huge difference, hard to say if it is significant in terms of on air 2020 operation :thinking:

This plot also shows not much difference in extreme excursions:

Screenshot from 2022-01-01 08-37-40

This could be a bug, or perhaps the pitch and voicing bits are super sensitive to bit errors.

Looking at the probabilities, at BER=0.02, there is a 1-binocdf(1,11,0.02)=0.019513 probability of >1 error in the first (stage 1) quantiser that encodes most of the frame energy. Index optimisation gives us some protection against 1 bit errors. So at a 30ms frame rate we would expect a error every 0.03/0.0195=1.5 seconds which looks similar to the plots above.

There might be some argument for unequal error protection, e.g. a small LDPC codeword covering just the stage 1 VQ index and pitch bits.

drowe67 commented 2 years ago

OK it looks like the first stage VQ is super sensitive. If we simulate with the first stage VQ (bits 0..10) error free, and add errors to all other bits:

sox ../wav/wia.wav -t raw -r 16000 - | ./src/lpcnet_enc -s | ./src/lpcnet_dec -s -b 0.04 --ber_st 11 | aplay -f S16_LE -r 16000

... it sounds pretty good. The quality decreases gradually with increasing BER (like SSB), but doesn't break down and there are no loud clicks and pops.

Screenshot from 2022-01-01 10-38-06

This suggests the following waveform design:

  1. Really strong protection of just bits [0..10] (the stage1 VQ quantiser). For example a rate 0.5 LDPC code on just the first 11 bits. This would mean a (n,k) = (52+11,52) = (63,52) rate 0.82 overall, so a modest 18% increase in RF bandwidth. This could be a redesign of the current 2020 FEC (which covers all bits), or added to the prototype 2020A waveform (if it will fit in terms of RF bandwidth). Or maybe both.
  2. For listening tests it's unclear if the index optimisation is worth it. It may provide additional protection if combined with FEC on the first vector, as per previous index optimisation results on a prototype lower bit rate codec. The objective SD results above suggest it is useful.