YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.13k stars 212 forks source link

在实际的应用中,录的音频会产生很多环境噪音,请问有什么好的办法降噪么? #7

Closed Hotlat6077 closed 3 years ago

YuanGongND commented 3 years ago

您好,

这个问题和AST模型不是直接相关,我对这个问题了解有限。我认为对于声学事件检测而言,某些在语音识别中被认为的噪声其实是检测目标,并且AudioSet数据本身是来源于用户上传的音频就带有各种环境噪音,因此噪声对声学事件检测任务的影响会较小。如果您指的是通道噪声,可以参阅关于消除通道噪声相关的论文。在语音指令识别的任务中,我们在训练集中人为增加了随机噪声,这一方面可以稍微提高准确率,也应该可能增强模型在噪声环境中的鲁棒性。您可以考虑增加更现实的噪声,比如MS-SNSD噪声数据集。

The question was about how to denoise the input. This question is not directly related to the AST model, but I think as for AudioSet, some noises for ASR are actually the recognition target (e.g., vehicle sound, etc) for audio event classification. Therefore, I guess the impact of noise is comparably small for the audio event classification task.

We did augment random noise for the speechcommands task https://github.com/YuanGongND/ast/blob/87b81895ee866e5be451a5c139751735f836df76/src/dataloader.py#L194, this improves the performance a little bit and might make the model more robust in a noise environment. It might be worth using a more realistic noise dataset such as MS-SNSD.

-Yuan