Closed bagustris closed 11 months ago
Yes, it is a very simple approach. I compared two methods in my experiment, one above using the percentile of energy and the other using the mean of energy to separate low and high energy. Using percentile leads smaller error but it needs improvements. There is another approach using deep learning model from speechbrain, but the performance is similar with additional limitations (cannot detect snr > 10 dB).
tried that for selected emodb samples:
imho it's a bit too simplistic: by comparing high and low energy bands, speech with high arousal (e.g.: 'anger') gets predicted a higher SNR than speech with low (e.g. 'sad')