About the results of model prediction

AlexGidiotis / Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

An end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.

MIT License

28 stars 5 forks source link

I think that since you are getting output you are using the model correctly. The issue here is that the training hasn't converged. CTC is usually tricky to optimise so the training might not be as smooth as with other loss functions. Unfortunately, I cannot tell if the MFCC features are correct but if you followed the instructions they are most likely fine.

With that said I think you should try the following:

Make sure that your labels are correctly assigned during the training.
Once the training loss stops reducing for a few epochs (10-20) try interrupting the training and restarting from the checkpoint.
Play around with the learning rate and the clipping value.
Increase the early stopping patience.

AlexGidiotis / Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

About the results of model prediction #2