MiteshPuthran / Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
MIT License
1.3k stars 438 forks source link

Using the model for prediction #34

Closed DinaAlBassam closed 4 years ago

DinaAlBassam commented 5 years ago

Hello,

I am trying to use the already trained model directly for predicting the emotions.

I wrote this put this code in a python file and run it: def predict(): lb = LabelEncoder() Model_filename = 'saved_models/Emotion_Voice_Detection_Model.h5' Model = load_model(Model_filename) X, sample_rate = librosa.load('filename.wav', res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5) sample_rate = np.array(sample_rate) mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13),axis=0) featurelive = mfccs livedf2 = featurelive livedf2= pd.DataFrame(data=livedf2) livedf2 = livedf2.stack().to_frame().T twodim= np.expand_dims(livedf2, axis=2) livepreds = Model.predict(twodim,batch_size=32,verbose=1) livepreds1=livepreds.argmax(axis=1) liveabc = livepreds1.astype(int).flatten() livepredictions = (lb.inverse_transform((liveabc))) livepredictions

But, it displays an error in the (lb.inverse_transform), it says that the (lb) need to be trained first .. Is there a method where I can use it which returns the emotion's name, without a need for using the dataset and training the model again?

Also I have another question, Is this model a language-independent model? Thanks,

dlpazs commented 5 years ago

Try lb.fit() first, have a look at the sklearn label encoder docs

dlpazs commented 5 years ago

I found this worked for a simple test but I think I have the labels incorrect:

pred_to_class = {
    0: "female_angry",
    1: "female_calm",
    2: "female_fearful",
    3: "female_happy",
    4: "female_sad",
    5: "male_angry",
    6: "male_calm",
    7: "male_fearful",
    8: "male_happy",
    9: "male_sad"
}

def predict():
    lb = LabelEncoder()
    Model_filename = 'saved_models/Emotion_Voice_Detection_Model.h5'
    # Model = load_model(Model_filename)

    json_file = open('model.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    loaded_model = model_from_json(loaded_model_json)
    loaded_model.load_weights("saved_models/Emotion_Voice_Detection_Model.h5")
    print("Loaded model from disk")

    X, sample_rate = librosa.load('output10.wav', 
        res_type='kaiser_fast',
        duration=2.5,
        sr=22050*2,
        offset=0.5)
    sample_rate = np.array(sample_rate)
    mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13),axis=0)
    featurelive = mfccs
    livedf2 = featurelive
    livedf2= pd.DataFrame(data=livedf2)
    livedf2 = livedf2.stack().to_frame().T
    twodim= np.expand_dims(livedf2, axis=2)

    preds = loaded_model.predict(twodim, 
                         batch_size=32, 
                         verbose=1)
    preds1 = preds.argmax(axis=1)
    abc = preds1.astype(int).flatten()
    print(pred_to_class[abc.item()])
Kai-Karren commented 4 years ago

@dlpazs version is working fine for me.