Implements Demucs for denoising audio in the Demucs class and pipelines the class with the transcriber. When creating a transcriber object, the user can specify whether or not to apply denoising along with other parameters to customize denoising functionality. Demucs separates audio into vocals and other sounds from the original file, but currently the user only has access to the separated vocals and other wav files are discarded.
Implements Demucs for denoising audio in the Demucs class and pipelines the class with the transcriber. When creating a transcriber object, the user can specify whether or not to apply denoising along with other parameters to customize denoising functionality. Demucs separates audio into vocals and other sounds from the original file, but currently the user only has access to the separated vocals and other wav files are discarded.
Currently, you need to manually install demucs to use this: pip install git+https://github.com/facebookresearch/demucs#egg=demucs
Transcription performance was evaluated on both noisy audio and the denoised audio. Transcription is slightly better after denoising.
About the dataset:
I used a subset of the VOiCES dataset, called VOiCES_devkit: https://iqtlabs.github.io/voices/
Evaluation of Seamless transcription performance on the noisy audio:
Evaluation of Seamless transcription performance on the dataset after denoising:
Use this example for manual testing: