Open AtaaEddin opened 6 years ago
If you don't have multiple gpu's it may be better to finetune a kinetics pre-trained model. A P100 or V100 may be enough for quick finetuning if your sign language dataset is not too big.
Regarding whether the model is the right choice, being a 3d convnet it is a more natural fit for offline processing (it processes the time dimension in parallel, similar to the space dimensions).
If you want to do real time processing you either need to break the incoming video into temporal chunks and pass these through the model, or will have to convert the model graph such that it processes frame by frame but keeps internal state of previous activations (e.g. see https://arxiv.org/abs/1806.03863).
thank you for the response... I want to know what is the fps for offline processing video or how much time will take it to proccess 2 sec video for example?
I'm trying to do sign language recognition in running time so I'm wondering if this model the right choice to take here, and I'm wonder what kind of GPUs are necessary to train such a model.
thanks.