Deepfilternet degrades Sound Quality for Speech-to-Text Transcription

asusdisciple commented 6 months ago

So this is not per se an issue with the model, but rather something to be aware of.

I tested the latest transcription models (seamless, whisper v1,v2,v3, etc.) on the Fleurs dataset and wondered If I would be able to increase performance by denoising the audio files.

I therefore augmented everything with noise and ran a test on noisy and the clean Fleurs data. The results was the same though, in both cases denoising with deepfilternet decreased the quality of the transcription by about 20% WER at least.

This is a shame because noisy data in general performs worse in transcription than clean data and denoising it could be beneficial. But it seems to the model cutting away the noise and likely some useful frequencies from the clean data, results in bad performance.

lnicola commented 6 months ago

https://news.ycombinator.com/item?id=36226653

StuartIanNaylor commented 6 months ago

The only problem is with ASR that you don't have access to training and training data. As likely if you add noise and process with deepfilternet to create a training dataset, then train your ASR, your final model with deepfilternet should work perfectly. ASR can cope with artefacts as long as they are trained in...

Rikorose / DeepFilterNet

Deepfilternet degrades Sound Quality for Speech-to-Text Transcription #483