AvishakeAdhikary / Realtime-Sign-Language-Detection-Using-LSTM-Model

Realtime Sign Language Detection: Deep learning model for accurate, real-time recognition of sign language gestures using Python and TensorFlow.
MIT License
23 stars 6 forks source link

Information About Dataset and How To Use It. #1

Closed IbrarBabar closed 1 month ago

IbrarBabar commented 1 month ago

Hi, I am trying to use this repository for real-time sign language detection. In the demo video you provided, it correctly identifies the label in the live camera feed. However, the provided notebook is not working the same way—it shows the camera with hand movements but does not display any labels.

Additionally, I need your assistance in understanding the dataset format. Does the .npy file contain images in NumPy format, or is it in a different format?

AvishakeAdhikary commented 1 month ago

The code absolutely works exactly like I showed in the notebook. You just need to change a few things here and there.

The thing you need to understand is, there is no pre-recorded dataset.

You see the line:

actions = np.array(['cat', 'food', 'help'])

This is where I define the number of actions I want to capture for training.

The same actions go here:

signs = ['cat','food','help']

This is for setting up folders where I capture keypoints (or in terms of MediaPipe: landmarks) which are none other than points that the model is able to detect based on your camera (face, hand, pose, etc.). All the keypoints are then concatenated into a single NumPy array (hence the file names being .npy). You will see another section labeled Collect Keypoint Values for Training and Testing where the capture actually happens.

Then I load the appropriate keypoints (inputs) as X and their corresponding labels as y. And perform a train-test split of 95% as you can see in this line:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)

The important thing to note is, I capture 30 frames of keypoint sequential data with a total of 1662 keypoints in each frame (hence the shape of the inputs being: input_shape=(30,1662)). Making this one of the first models to implement motion within sign language and not just the signs themselves.

I then save the model using these lines:

model.save('./model.h5')
model.save_weights('./model_weights.h5');

And in the end you will see two parts:

  1. If you're continuously running the same process to perform inference
  2. If you're trying to load my model and just performing the inference in a single block of code

Let's say you're trying to load my model, then you don't need to run any of the above code that is there in the notebook as you're just loading the model that I have saved.

But, if you're continuously running the same process to perform inference, you will have to train the model with your own dataset and run the code leaving the last block of code out.

The notebook has been prepared for flexibility rather than closed scope. This model has a large room for improvement. Otherwise, this model works absolutely fine and exactly like the video shows.

If you find a better room for improvement you're always invited to contribute to the project (maybe even provide a better documentation for the code that I have implemented) after forking the repo, making the changes and creating a pull request. That's because this code has been out for a while now, and I prepared this code during my master's thesis during my time at the university and I don't have the time as I'm already a Machine Learning Engineer and have been working for a living.

I hope I could help. Happy coding.

IbrarBabar commented 1 month ago

@AvishakeAdhikary - Thank you so much for that clear explanation. It is really clear to me. I am actually using your models, not training the model. I am using your checkpoint to do inference, trying to run the file 'RealTimeSignLanguageDetection.ipynb' from the start and skipping the training portion of the code. However, I was not able to see the desired output as yours. Maybe it is because I am running the entire code except for the training code.

AvishakeAdhikary commented 1 month ago

@IbrarBabar Try installing TensorFlow correctly and try utilizing an NVIDIA GPU if possible. (CUDA, CUDnn and other C++ tools)

I trained the model with almost 95% accuracy for just those three words/phrases. (maybe I trained some other words/phrases, I don't really remember correctly), but those shouldn't be anything else other than the ones I showed in the video.

I hope this helped with the motivation of the code and I hope I also encouraged you to make some improvements to the model or project as well...

Happy Coding.