anasmorahhib / 3D-CNN-Gesture-recognition

Gesture recognition using tensorflow from a large video database
49 stars 26 forks source link

Extraction of frames from the video. #9

Open AsadZahed opened 4 years ago

AsadZahed commented 4 years ago

Hello Anas, I am doing similar project like this but in that project i want to recognize the continous signs of sign language. I found your project helpful and i am just a begginer in Deep-learning. I want to know that i have data of sign language in form of simple videos of gesture performed by volunteers which is similar to 20BN-jester data-set. I want to know how you are extracting the frames from the videos, and one more thing in my case if there is a video of a whole sentance that include different gestures of signs( below link is to similar sentance video, you can have good idea what i want to say) then how we can extract the frames for the all these signs and identify them as individual sign. I will be very thankfull for your help.

https://www.psl.org.pk/signs/en/sentences/5c9634bb0625be0004d9217b

ZissisT commented 4 years ago

Hello @AsadZahed,

I think that you can extract video frames using ffmpeg, e.g.

ffmpeg -i video.webm -vf fps=12 -qscale:v 2 %5d.jpg

This will extract images from video, at 12 images per second using 0000x.jpg format (similar to what 20bn-jester is using)

trumpiter-max commented 3 years ago

Hi, how did you create your own data_csv from video

anasmorahhib commented 3 years ago

hello @AsadZahed , To get frames from video, check the main-beginners-syntax.py file, block (# In[14]). `

hm_frames = 30 # number of frames
 # unify number of frames for each training
def get_unify_frames(path):
    offset = 0
    # pick frames
    frames = os.listdir(path)
    frames_count = len(frames)
    # unify number of frames 
    if hm_frames > frames_count:
        # duplicate last frame if video is shorter than necessary
        frames += [frames[-1]] * (hm_frames - frames_count)
    elif hm_frames < frames_count:
        # If there are more frames, then sample starting offset
        #diff = (frames_count - hm_frames)
        #offset = diff-1 
        frames = frames[0:hm_frames]
    return frames  

`

If you have several gest in a training video you have to cut it manually or by a python script..., because for the part of training the model cannot yet detect the gestures, after the training of the model he will be able to detect the different jest in a only video.