aqua1907 / Gesture-Recognition

ASL language recognition using pre-trained MediaPipe models
3 stars 2 forks source link

Low FPS #1

Open arslaanzafar opened 4 years ago

arslaanzafar commented 4 years ago

Hello. I tired your project. Its really good but the issue is that FPS is very low. It is due to python. ? Is mediapipe only for mobile devices ? Will hand detection using mediapipe will work with good fps on mobile device ? Thank you !

aqua1907 commented 4 years ago

Hello. I tired your project. Its really good but the issue is that FPS is very low. It is due to python. ? Is mediapipe only for mobile devices ? Will hand detection using mediapipe will work with good fps on mobile device ? Thank you !

Hello, thank you for asking. If you tried mediapipe hands web-demo you could notice that fps is In a range 13-25. In my implementation, I am using python to run the TensorFlow Lite graph which was developed and optimized for mobile devices. Running this model with additional image transformations and extra computations prevent to run this app at low fps. I think it is about 5-8 fps. I do not know will it run better on a more powerful CPU, because the TensorFlow lite model doesn't use GPU resources, only CPU. Mine CPU is Intel Core i5-6600K. And again, generally, mediapipe is developed and optimized for mobile devices.

arslaanzafar commented 4 years ago

Thank you for reply. I got your point. I was wondering if i could do the ASL recognition on mobile using mediapipe. I tired that but the your dataset does not really match the landmarks which comes from mediapipe android build. Is there a way:

  1. I can use your dataset to predict ASL by using landmarks from mediapipe android app.
  2. Or can i build my own dataset using landmarks from mediapipe android app. Should i extract landmarks from each Alphabet video. Or how you build it. Can you explain.

Thank. You

aqua1907 commented 4 years ago
  1. I think you can.
  2. I used mediapipe to extract coordinates of each predicted landmark and save them to the CSV file. I ran the model for each sign and got around 300 samples per sign, you can see this data in the data_keypoints folder. Then I calculated Euclidean distances between keypoints for each sign and concatenated all these data to alphabet.csv file — this is data for training and testing.
arslaanzafar commented 4 years ago

Alright.

  1. But there is a difference between the landmarks coming from your python and the one coming from mediapipe android . I tired doing x256 and y256 but still there is a large difference between both landmarks for same sign.
  2. So i have never created such dataset. Can you explain the procedure how you did these 300 samples per. Thank you