对中文的识别效果怎么样 - Githubissues

MiteshPuthran / Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

MIT License

1.3k stars 438 forks source link

对中文的识别效果怎么样 #30

Closed lzj520 closed 5 years ago

MiteshPuthran commented 5 years ago

Sorry, I don't understand the language. Can you post it in English.

MiteshPuthran commented 5 years ago

https://github.com/MITESHPUTHRANNEU/Speech-Emotion-Analyzer/issues/31

lzj520 commented 5 years ago

What is the recognition effect of Chinese

lzj520 commented 5 years ago

whether it is normal or not

MiteshPuthran commented 5 years ago

This is not normal. The difference between your training and validation error is increasing as you train your model. This means that your model is overfitting and would not be able to recognize or predict the emotions on the audio files it has never encountered before

lzj520 commented 5 years ago

I trained the 16000 sample rate audio, Chinese corpus removal,duration=2.5,offset=0.5, and obtained the results. If the length of the audio is uncertain, how can I accurately obtain the accuracy of the model? If duration=2.5,offset=0.5, ValueError: Error when checking input may occur:Expected conv1d_1_input to have shape (32, 1) but got array with shape (24, 1)

MiteshPuthran commented 5 years ago

You can try to increase the sampling rate to anything greater than 16000. This way you can extract more number of features from the audio files. As you have clipped your audio files to 2.5 sec, try increasing it to maybe 3 sec or less than 2.5 as you described some files might be smaller. If some of the audio files are smaller then the features would be 0 for that particular audio file.

The ValueError that you are facing is due to some dimension error in your input data. Check the dimension of the test and training data.

lzj520 commented 5 years ago

I tried duration=1,sr=22050*2, and recognized emotions as anger and happiness. The recognition effect was poor, so how to improve?

MiteshPuthran commented 5 years ago

duration = 1 is not a good option. If you are just providing it 1 sec of audio, it is difficult to extract valuable features. I would suggest you to have atleast 2-3 seconds. This way the model has features where it can distinguish between different emotions.

lzj520 commented 5 years ago

How to train the corpus not enough 2 seconds will make mistakes

MiteshPuthran commented 5 years ago

Are you saying that the corpus you have consist of audio files less than 2 secs long?

lzj520 commented 5 years ago

Yes, I use Chinese corpus, which contains audio files that are not 2 seconds slow. The training result is not ideal

lzj520 commented 5 years ago

Duration =2, ValueError: Error when checking input: expected conv1d_1_input to have shape (173, 1) but got array with shape (155, 1) Is there any way to adjust it

lzj520 commented 5 years ago

I tried to modify the parameters and got both emotions right 73 percent of the time, but always got the following results when using the audio test

lzj520 commented 5 years ago

Do not modify your code training results as follows

MiteshPuthran commented 5 years ago

Yeah, the shape of the input is different than what you are passing it in conv_1d layer. Refer to cell number 30 and 45, there you can see that I checked the shape of my training set and then supplied the same shape as input to the conv_1d layer in the 45th cell.

When you are testing it out on the live predictions, the audio file you passed as the input doesn't have too many features after extraction and its just giving two features. One starting with 1 and other with 0.