drowe67 / LPCNet

Experimental Neural Net speech coding for FreeDV
BSD 3-Clause "New" or "Revised" License
69 stars 24 forks source link

automated tests for LPCNet states & new network for peter and david #10

Closed drowe67 closed 4 years ago

drowe67 commented 4 years ago

Another attempt to debug run tine loading of nets - using some tools I wrote to debug Linux <-> Windows port. Also trying to fix problems LPCNet has with rough speech on a couple of speakers.

  1. Can now load NNs at run time, and have Ctests with Travis support. This is much stronger testing to make sure nothing is messed up during NN experiments, and allows us to move between known working NNs easily.

  2. tinytrain working with high quality speech (single m & f sample) on log mag nets, and have automated test to make sure it keeps working.

  3. tinytrain working with high quality speech (single m & f sample) on log mag nets using codec 2 voicing estimator.

  4. full_train.sh used to experiment with getting reasonable speech from "peter" and "cq_16kHz" samples. Have determined that training is not deterministic, so different results on exactly the same training material possible.

  5. Developed 190924a NN that gives reasonable results on "peter" and "cq_16kHz" , and OK on "all.wav" samples except "canadian" which is a little down in quality. Made 190924a the default.

  6. 190924a very rough when quantised compared to 190215 or 20h networks. This can be reproduced using decimation/interpolation, e.g.

    sox ../../wav/birch.wav -t raw - | ./dump_data --c2pitch --test - - | ./quant_feat -d 3 | ./test_lpcnet  - - | aplay -f S16_LE -r 16000

    is much rougher than:

    sox ../../wav/birch.wav -t raw - | ./dump_data --c2pitch --test - - | ./quant_feat -d 3 | ./test_lpcnet -n ../../unittest/lpcnet_190215.f32 - - | aplay -f S16_LE -r 16000

    Many tests performed and it appears only difference is the NN. Not sure why one NN responds better to decimation/interpolation compared to another.

  7. OK interpolation bug was traced to error in use of codec 2 pitch estimator. We were training on a discrete PDF of pitch values, that broke pitch embedding when pitch values were interpolated. Have trained a new network 191005 that works well on peter, cq_16kHz, and all, and works well with quant_feat -d 3 and lpcnet_enc/lpcnet_dec (quantisation). Solution was to use codec 2 pitch estimate after refinement.