Audio-WestlakeU / FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
https://fullsubnet.readthedocs.io/en/latest/
MIT License
538 stars 153 forks source link

Volume problem after noise reduction #31

Open zoujh320 opened 2 years ago

zoujh320 commented 2 years ago

@haoxiangsnr Sometimes there are volume variations when speech is enhanced by pre-trained model,for example: noisy: image enhanced: image

haoxiangsnr commented 2 years ago

Hi,

In the DNS challenge test dataset, there are many speeches with low sound volume. If we directly listen to the results enhanced by neural networks, the volume of different speeches is not equal and uncontrolled, i.e., it could be higher or lower than the original input. We may need to constantly change the volume of speech to adapt to different volume levels during the hearing test.

For this reason, in FullSubNet, we normalize the speech volume https://github.com/haoxiangsnr/FullSubNet/blob/main/audio_zen/inferencer/base_inferencer.py#L177. In this way, the resulting speeches will have comparable sound volume.

zoujh320 commented 2 years ago

Hi, This just sidesteps the volume issue, how can I use the FullSubNet model to reduce the noise without changing the volume.