cookcodes / Percepnet-Keras

percepnet implemented using Keras, still need to be optimized and tuned.
BSD 3-Clause "New" or "Revised" License
38 stars 21 forks source link

PercepNet (Still need to be tuned)

Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech

Compared with https://github.com/jzi040941/PercepNet , this version is implemented using Keras.


Due to github file size limit is 100M, rnn_data.c is compressed to rnn_data.c.tgz. This file need to be extracted before furthur compileing. % cd src % tar -xzvf rnn_data.c.tgz % cd ..

To compile, just type: % ./autogen.sh % ./configure % make

A simple command-line tool is provided as an example. It operates on RAW 16-bit (machine endian) mono PCM files sampled at 48 kHz. It can be used as:

./examples/rnnoise_demo

The output is also a 16-bit raw PCM file.


How to train: (change to src subdirectory, assumed the clean and noise files's directory are in ~/DNS-Challenge/datasets/rnnoise3/) cd ~/percepnet/src ./denoise_training ~/DNS-Challenge/datasets/rnnoise3/clean ~/DNS-Challenge/datasets/rnnoise3/noise 80000000 training.f32

(change to training subdirectory) cd ../training python bin2hdf5.py …/src/training.f32 80000000 138 training.h5 python rnn_train.py python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig cp rnn_data.c ../src/

(change to percepnet directory) cd ~/percepnet/ make clean make

(change to example subdirectory) cd examples ./rnnoise_demo test2.raw test2_denoised.raw


More: The performance of this version needs furthur optimization and tuning. In some cases, it is worse than Rnnoise. Any comments on how to optimize/tune are welcome.

test_gr in src/ is used to test the classical processing, using computed g, r (not from deep learning model) directly to check whether g,r work not not.

The overall framework is based on https://github.com/xiph/rnnoise And the speech signal processing codes are from https://github.com/jzi040941/PercepNet Compared with Rnnoise, the training data is standardized, and use float(not quantized) when conver to rnn_data.c. And during training, clip_norm is set to 0.1, or loss will be NAN.

The training data are from: https://github.com/microsoft/DNS-Challenge

Wavfiles processing codes(wav.h, wav.c) are from https://faculty.fiu.edu/~wgillam/wavfiles.html