felixbur / nkululeko

Machine learning speaker characteristics
MIT License
26 stars 4 forks source link

add mos / snr prediction model #42

Closed felixbur closed 10 months ago

felixbur commented 1 year ago

to be used for data quality and bias checking

felixbur commented 1 year ago

for examples this one: https://github.com/microsoft/DNS-Challenge/tree/master/DNSMOS

bagustris commented 12 months ago

Torchaudio already has this automatic MOS prediction tool (including PESQ, STOI, and SI-SDR) presented at ICASSP 2023 [1]. SNR estimation is not implemented yet and will be a good tool.

[1] A. Kumar et al., “TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio,” pp. 2–6, 2023, [Online]. Available: http://arxiv.org/abs/2304.01448.

bagustris commented 11 months ago

@felixbur

I have a working code to estimate SNR here: https://github.com/bagustris/audio-SNR. The error is about 2 dB (20 dB SNR is estimated as 22 dB). Show me how to incorporate that codes into Nkululeko if you agree.

felixbur commented 11 months ago

super. I haven't though yet of a way how to use that. It is part of a larger idea to enable Nkululeko to add speaker/speech characteristics from publicly available models to the databases, in order to check their bias on the target feature. An example: you have a database that is labeled with depression and speaker id, but nothing else. You would like to use public models for

to automatically label your data, and then use the explore module to check on biases between the target (depression) and the three added features (age, sex and SNR). I guess it would be easiest to implement this with a new, dedicated, module, like e.g. segmentation or augmentation. we could name it, e.g. "autopredict"

felixbur commented 11 months ago

I think best is to start with a template/example for a "add_speech_feat" class that takes dataframes and adds the predicted labels

I planned to do that for age/gender, but perhaps you start with snr