Closed drowe67 closed 2 years ago
Initial test with just stage1 optimised didn't improve the quality. -x
option selects index optimised VQ. The MSE on the stage1 VQ was reduced to 0.37 after 1000 iterations.
sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -s | ./src/lpcnet_dec -s -b 0.01 | aplay -f S16_LE -r 16000
sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -x | ./src/lpcnet_dec -x -b 0.01 | aplay -f S16_LE -r 16000
I have used the codec2
binary switch tool to develop some index optimised direct quantisers. lpcnet_quant.c
has been hacked to introduce bit errors, so we can test the effect of index optimisation.
Typical command line for test:
sox ../wav/all.wav -t raw - | ./src/dump_data --c2pitch --test - - | ./src/quant_feat -i --mbest 5 -p 0 -d 3 -q split_stage1_indopt.f32,split_stage2_indopt.f32,split_stage3_indopt.f32,split_stage4_indopt.f32 > /dev/null
VQ | BER | SD (dB) |
---|---|---|
Original direct split | 0.0 | 2.2 |
Index optimised direct split | 0.0 | 2.2 |
Original direct split | 0.01 | 77.3 |
Index optimised direct split | 0.01 | 31.0 |
When I listen on a short files (wia.wav
and peter.wav
) index opt sounds a little better. In general this codec is pretty noisy at 1% BER, suspect this codec is more sensitive to bit errors that the lower rate codecs.
When we integrate the index optimised tables, we can compare the results of direct:
sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -s | ./src/lpcnet_dec -s -b 0.02 | aplay -f S16_LE -r 16000
versus direct index optimised:
sox ../wav/peter.wav -t raw -r 16000 - | ./src/lpcnet_enc -x | ./src/lpcnet_dec -x -b 0.02 | aplay -f S16_LE -r 16000
The latter is a little better (less extreme clicks and pops), but still plenty of errors can be heard. It's not a huge difference, hard to say if it is significant in terms of on air 2020 operation :thinking:
This plot also shows not much difference in extreme excursions:
This could be a bug, or perhaps the pitch and voicing bits are super sensitive to bit errors.
Looking at the probabilities, at BER=0.02, there is a 1-binocdf(1,11,0.02)=0.019513
probability of >1 error in the first (stage 1) quantiser that encodes most of the frame energy. Index optimisation gives us some protection against 1 bit errors. So at a 30ms frame rate we would expect a error every 0.03/0.0195=1.5 seconds which looks similar to the plots above.
There might be some argument for unequal error protection, e.g. a small LDPC codeword covering just the stage 1 VQ index and pitch bits.
OK it looks like the first stage VQ is super sensitive. If we simulate with the first stage VQ (bits 0..10) error free, and add errors to all other bits:
sox ../wav/wia.wav -t raw -r 16000 - | ./src/lpcnet_enc -s | ./src/lpcnet_dec -s -b 0.04 --ber_st 11 | aplay -f S16_LE -r 16000
... it sounds pretty good. The quality decreases gradually with increasing BER (like SSB), but doesn't break down and there are no loud clicks and pops.
This suggests the following waveform design:
Part of experimental 2020A mode as discussed in #217,
codec2
side of this work in https://github.com/drowe67/codec2/pull/274.lpcnet_enc/lpcnet_dec
and see if it improves speech quality in channels with random errors (lpcnet_dec
can introduce errors).Conclusions
April 2022.
This work was written up in a blog post. Conclusions:
I've decided to merge this PR anyway, because: