MiteshPuthran / Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
MIT License
1.29k stars 436 forks source link

Training from scratch doesn't reach the same loss #22

Closed nicolov closed 5 years ago

nicolov commented 5 years ago

Hey, thanks a lot for the release. I've tried training the model from scratch using the datasets, but I can't reach the same validation loss. I noticed that the pre-trained network in the repo has two more convolutional layers compared to the code in the notebook, but adding them back doesn't help either.

Did you se any additional tricks for training?

For reference, above is what I see, below is what you have in the dataset:

57238914_2564767740218633_7361183013125750784_n

gianlucahmd commented 5 years ago

Same here. I noticed that the network on the notebook is different from the model you've saved. Even using the same network, the performances are way worse than the ones you achieved using the same code. Maybe you had some clever training trick?

MiteshPuthran commented 5 years ago

Hello @nicolov and @gianlucahmd. I wish I had some magic tricks go increase the accuracy. There is only one thing that comes to my mind right now, which is try to augment the data that you have available and try to retrain the model.

gianlucahmd commented 5 years ago

Did you perform data augmentation yourself? Maybe that's the problem: I just downloaded the data from the links but end up having ~900 samples in the training set whereas you had ~1300.

Thanks for getting back to me!

nicolov commented 5 years ago

I wish I had some magic tricks go increase the accuracy.

Sure, I was just wondering how you got to the accuracy in the pretrained model that's in repo, as I can't seem to reproduce its accuracy using your training code.

srhoit59 commented 5 years ago

Can someone pls tell how you have organized data in the folder.have you created any subfolders

MiteshPuthran commented 5 years ago

@nicolov I used the same code as what you see in the notebook. Try using different sampling rates while extracting the features. Maybe more features would help to increase the accuracy.

@srhoit59 I put all the audio files in one folder.

gianlucahmd commented 5 years ago

Hey @MITESHPUTHRANNEU, I have the same problem as nicolov but it can't be a different number of features, otherwise your model wouldn't work at inference time. It must be something different during training and/or different data.

My first bet is different data, as downloading the data from the link you provided I get ~900 samples in the training set whereas from your notebook I see you had ~1300. Can you double-check that all the data you used is the one available from these links?

MiteshPuthran commented 5 years ago

Hi @gianlucahmd, yes if you change the sampling rate then you can't use my model. I have used used the data from the described sources. They may have changed the audio files as I had done this project in 2017. Unfortunately I don't have the data that I had used during training anymore.