GreenWaves-Technologies / tiny_denoiser

8 stars 3 forks source link

TinyDenoiser on GAP9

This project demonstrates a Recurrent Neural Network (RNN) based method for Speech Enhamencement on GAP9.
The main loop of the application continuosly samples data from the microphone at 16kHz, applies the RNN filter and reconstruct the cleaned signal via overlap-and-add. As depitcted in the Figure below, the nosiy signal is windowed (frame size of 25 msec with an hop lenght of 6.25 msec and Hanning windowing) and the STFT is computed. The RNN is fed with the magnitude of the STFT components and return a suppression mask. After weighting, the inverse STFT returns a cleaned audio clip.

alt text

Demo Getting Started

The demo runs on the GAP9 Audio EVK, using the microphone of the GAPmod board.

cmake -B build
cmake --build build --target run

It can also run on GVSoC. Please read the GVSoC-gvcontrol section to understand what is done

cmake -B build
cmake --build build --target menuconfig # Select GVSoC in the menu. In gvsoc_option, enable "GVSOC proxy mode."
cmake --build build --target run
./gvcontrol --port 30000 # in another terminal

Optionally, the application can run on GVSOC (or board) to denoise a custom audio file (.wav).

cmake -B build
cmake --build build --target menuconfig # Select the options DenoiseWav in the DENOISER APP -> Application mode menu
cmake --build build --target run

Output wav file will be written to test_gap.wav inside the project folder.

Project Structure

NN Quantization Settings

The Post-Training quantization process of the RNN model is operated by the GAPflow. Both LSTM and GRU models can be quantized using one of the different options:

Application Mode Configuration

In addition to individual settings, some application mode are made available to simplify the APP code configuration. This is done by setting the Application Mode in the make menuconfig DENOISER APP menu

Demo Setting (Application Mode DEMO or DenoiserWav)

The code runs inference using the denoiser_dns.onnx model with FP16MIXED quantization. More accurate at higher energy costs can be obtained with FP16 quantization by changing the nntool_script_demo.

Python Utilities

The test_accuracy/test_GAP.py file provides the routines for testing the NN inference model using the NNtool API. The script can be used to run tests on entire datasets (--mode test) or to denoise individual audio files (--mode test). Some examples are provided below.

To denoise a wav file

python test_accuracy/test_GAP.py --mode sample --pad_input 300 --sample_rate 16000 --wav_input /<path_to_audio_file>/<file_name>.wav
python test_accuracy/test_GAP.py --mode sample --pad_input 300 --sample_rate 16000 --wav_input samples/dataset/noisy/p232_050.wav --quant fp16mixed

The output is saved in a file called test_gap.wav in the home of the repository

To test on dataset

python test_accuracy/test_GAP.py --mode test --pad_input 300 --noisy_dataset_path ./<path_to_noisy_audio_dataset>/ --clean_dataset_path ./<path_to_clean_audio_dataset>/

GVSoC - gvcontrol

To run the Demo mode on GVSoC you can use the gvcontrol file. the gvcontrol is used to send/read data to/from the i2s interface of the gap9 gvsoc. You can chose the input noisy wav file you want to process. The execution can be long (up to 5 minutes for 3 seconds of simulation). Since gap is waiting for pdm data, the pcm/pdm convertion module of gvsoc is used. To learn more about this please refer to the following example in the sdk : basic/interface/sai/pcm2pdm_pdm2pcm.