When I tested the sound examples you gave, the enhanced speech were poor?

francoisgermain / SpeechDenoisingWithDeepFeatureLosses

Speech Denoising with Deep Feature Losses

MIT License

185 stars 53 forks source link

When I tested the sound examples you gave, the enhanced speech were poor? #1

Open Lynlzz1314 opened 6 years ago

Lynlzz1314 commented 6 years ago

Hi, francoisgermain. When I run the senet_infer.py , I got enhanced speech . But I didn't have a good result

francoisgermain commented 6 years ago

Hi, I'm sorry to hear you're having issues. I'm afraid me and several other people were able to run it successfully before, so I have to ask you for a few more details to be able to help you. Could you run me through the operations you did on your machine to get this result? And give me some details on your configuration? Thanks!

Lynlzz1314 commented 5 years ago

First, my own noisy speech flies(16kHz, .wav) were stored in the folder noisy_speech. Then, i changed 'valfolder = "dataset/valset_noisy" ' to 'valfolder = "noisy_speech ' in the script senet_infer.py . Finally, I run "python senet_infer.py' . I got the folder noisy_speech_denoised. But enhanced speech after the denoisng algorithm didn't seem to work in the folder noisy_speech_denoised.

ViliusT commented 5 years ago

@Lynlzz1314 you probably have 16 bit audio files; you want pcm_f32le audio encoding - I don't use sox, but if you have ffmpeg installed, you can try converting your file:

ffmpeg -y -i INPUT.wav -acodec pcm_f32le -ac 1 -ar 16000 -vn OUTPUT.wav

If you do use sox, have a look at download_sedata.sh file.

@francoisgermain - it would make sense to mention that currently trained network expects 32 bit audio files in the readme - I think majority of 16khz speech corpora is on 16 bits, so there's bound to be a few people who forget to check that.

saurabh-kataria commented 5 years ago

I also experienced the same issue. Converting audio files to 32-bit float is essential for getting good enhancement quality. I used sox to do that: sox input.wav -r 16000 -b 32 -e float output.wav

francoisgermain commented 5 years ago

Very sorry guys. I never checked the integer data, but you're right that scipy.io.wavfile does not normalize the audio between -1.0 and +1.0. I'll add a note for now since converting to 32-bit float goes around the problem, and I'll see if I can include a fix. Thanks for the thorough investigation.

Nerdy314159265 commented 4 years ago

Are there any methods to use this on 48khz audio directly without having to resample down to 16khz?