felixbur / nkululeko

Machine learning speaker characteristics
MIT License
26 stars 4 forks source link

add snr estimation #51

Closed bagustris closed 11 months ago

felixbur commented 11 months ago

tried that for selected emodb samples:

emotion-est_snr_samples

imho it's a bit too simplistic: by comparing high and low energy bands, speech with high arousal (e.g.: 'anger') gets predicted a higher SNR than speech with low (e.g. 'sad')

bagustris commented 11 months ago

Yes, it is a very simple approach. I compared two methods in my experiment, one above using the percentile of energy and the other using the mean of energy to separate low and high energy. Using percentile leads smaller error but it needs improvements. There is another approach using deep learning model from speechbrain, but the performance is similar with additional limitations (cannot detect snr > 10 dB).