This dataset contains a large collection of clean speech files and variety of environmental noise files in .wav format sampled at 16 kHz.
The main application of this dataset is to train Deep Neural Network (DNN) models to suppress background noise. But it can be used for other audio and speech applications.
We provide the recipe to mix clean speech and noise at various signal to noise ratio (SNR) conditions to generate large noisy speech dataset.
The SNR conditions and the number of hours of data required can be configured depending on the application requirements.
This dataset will continue to grow in size as we encourage researchers and practitioners to contribute to this dataset by adding more clean speech and noise clips.
This dataset will immensely help researchers and practitioners in accademia and industry to develop better models.
We also provide test set that is different from training set to evaluate the developed models.
We provide html code for building two Human Intelligence Task (HIT) crowdsourcing applications to allow users to rate the noisy audio clips. We implemented an absolute category rating (ACR) application according to ITU-T P.800. In addition we implemented a subjective testing method according to ITU-T P.835 which allows to rate the speech signal, background noise, and the overall quality.
Microsoft Scalable Noisy Speech Dataset MS-SNSD
About: