Closed gao-hui closed 5 years ago
The correct rate on the verification set is 60.87%. You can add more hidden layers and use cross-validation to further improve the accuracy.
By the way, there's a tool named opensmile, which could extract audio features conveniently. In its config/emo IS09.conf, there is an approach to get fundamental frequency (F0), voicing probability, frame energy, zero-crossing rate, and 12 Mel-frequency Cepstral Coefficients (MFCC), which used in the paper you refer. And, thanks again!
Could you please talk about the accuracy rate of your model in the Berlin Database of Emotional Speech? and, thanks a lot~