Closed felixbur closed 10 months ago
for examples this one: https://github.com/microsoft/DNS-Challenge/tree/master/DNSMOS
Torchaudio already has this automatic MOS prediction tool (including PESQ, STOI, and SI-SDR) presented at ICASSP 2023 [1]. SNR estimation is not implemented yet and will be a good tool.
[1] A. Kumar et al., “TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio,” pp. 2–6, 2023, [Online]. Available: http://arxiv.org/abs/2304.01448.
@felixbur
I have a working code to estimate SNR here: https://github.com/bagustris/audio-SNR. The error is about 2 dB (20 dB SNR is estimated as 22 dB). Show me how to incorporate that codes into Nkululeko if you agree.
super. I haven't though yet of a way how to use that. It is part of a larger idea to enable Nkululeko to add speaker/speech characteristics from publicly available models to the databases, in order to check their bias on the target feature. An example: you have a database that is labeled with depression and speaker id, but nothing else. You would like to use public models for
to automatically label your data, and then use the explore module to check on biases between the target (depression) and the three added features (age, sex and SNR). I guess it would be easiest to implement this with a new, dedicated, module, like e.g. segmentation or augmentation. we could name it, e.g. "autopredict"
I think best is to start with a template/example for a "add_speech_feat" class that takes dataframes and adds the predicted labels
I planned to do that for age/gender, but perhaps you start with snr
to be used for data quality and bias checking