wrong extraction of features

MiteshPuthran / Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

MIT License

1.31k stars 439 forks source link

wrong extraction of features #60

Open chandrahaas02 opened 2 years ago

chandrahaas02 commented 2 years ago

This paper doesn't make any sence because you are taking average of 13mfcc features which is quite absurd as you it is ridiculous we actually have to mean for all the frames so there should .T at np.mean at feature extraction and from there everything should change your model, accuracy every thing as your function is fundamentally wrong , Hope you change it as this repo is most stared one ,so this lead to miss information for many

jjostschulte commented 1 year ago

I was also wondering about the feature extraction: In the README, it says that 3 seconds of audio are used, which matches the provided screenshot. But in this screenshot 25 mfcc are used, whereas in the notebook 13 are used with an audio duration of 2.5 seconds. Does anyone know with which features the saved model was trained?