drethage / speech-denoising-wavenet

A neural network for end-to-end speech denoising
MIT License
673 stars 165 forks source link

Denoised audio has 0db? #13

Open wuweijia1994 opened 6 years ago

wuweijia1994 commented 6 years ago

I do not know why. All the audio after denoising is almost silent. ` { "dataset": { "extract_voice": true, "in_memory_percentage": 1, "noise_only_percent": 0.02, "num_condition_classes": 29, "path": "data/ShowerNoise/", "regain": 0.06, "sample_rate": 16000, "type": "nsdtsea" }, "model": { "condition_encoding": "binary", "dilations": 7, "filters": { "lengths": { "res": 3, "final": [3, 3], "skip": 1 }, "depths": { "res": 128, "skip": 128, "final": [2048, 256] } }, "num_stacks": 3, "target_field_length": 1601, "target_padding": 1 }, "optimizer": { "decay": 0.0, "epsilon": 1e-08, "lr": 0.001, "momentum": 0.9, "type": "adam" }, "training": { "batch_size": 4, "early_stopping_patience": 16, "loss": { "out_1": { "l1": 1, "l2": 0, "weight": 1 }, "out_2": { "l1": 1, "l2": 0, "weight": 1 } }, "num_epochs": 15, "num_test_samples": 50, "num_train_samples": 450, "path": "sessions/ShowerNoise", "verbosity": 1 } }

I have 500 audio files for training, and inside them, there are 100 files are clean audio, 7 files are noise-only with the silence output. I do not know why.

`

drethage commented 6 years ago

Its not possible for me to identify the issue with only the information you provided. But I see you're using a different dataset. Its likely structured differently than the NSDTSEA dataset we used to train, so you'll want to write a class that handles loading and batching your dataset appropriately.

wuweijia1994 commented 6 years ago

Yeah, to test it, I also download the clean_trainset from the here. And then synthesize them together with my own noise using pydub.AudioSegment.overlay() For more details, the link is here

wuweijia1994 commented 6 years ago

https://drive.google.com/file/d/1zqnS38KiKF1NNDL8dMqdSXUzfPgxhFV-/view?usp=sharing This is the noise file. I only combine this audio with the clean_trainset files. For the config.json, I only change the dilation to 7 instead of 9 and the batch size to 4.