google-deepmind / kinetics-i3d

Convolutional neural network model for video classification trained on the Kinetics dataset.
Apache License 2.0
1.75k stars 462 forks source link

automatic sign language recognition. #29

Open AtaaEddin opened 6 years ago

AtaaEddin commented 6 years ago

I'm trying to do sign language recognition in running time so I'm wondering if this model the right choice to take here, and I'm wonder what kind of GPUs are necessary to train such a model.

thanks.

joaoluiscarreira commented 6 years ago

If you don't have multiple gpu's it may be better to finetune a kinetics pre-trained model. A P100 or V100 may be enough for quick finetuning if your sign language dataset is not too big.

Regarding whether the model is the right choice, being a 3d convnet it is a more natural fit for offline processing (it processes the time dimension in parallel, similar to the space dimensions).

If you want to do real time processing you either need to break the incoming video into temporal chunks and pass these through the model, or will have to convert the model graph such that it processes frame by frame but keeps internal state of previous activations (e.g. see https://arxiv.org/abs/1806.03863).

AtaaEddin commented 6 years ago

thank you for the response... I want to know what is the fps for offline processing video or how much time will take it to proccess 2 sec video for example?