breadbread1984 / R3DCNN

this project implements the hand gesture recognition algorithm introduced in paper online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks
99 stars 24 forks source link

no_action label in test mode #11

Open asdfqwer2015 opened 5 years ago

asdfqwer2015 commented 5 years ago

Hi, in ActionRecognition.py, I tested some videos from nvgesture dataset(untrimmed video). And it outputs class no. in 0~24 in each frames. i.e. it didn't outputs any blank or no_action label. If processing untrimmed video is online detection refer to trimmed video as offline detection. How to do online detection? Did I miss something? Thanks.

breadbread1984 commented 5 years ago

you should maintain an buffer of the time sequence and feed it into the model. the buffer serves as a fifo of the online frames.

asdfqwer2015 commented 5 years ago

Thanks for your quick reply. I might haven't described the issue clearly. En, for buffer mechanism, I've noticed the buffer mechanism in your ActionRecognition.py script before. And I just modified the script with video source(webcam => video file). It still process the video by fixed length buffer(80 frames) fifo.

The issue is how to process the frames without gesture in unsegmented input video. I found the model can only output 25 classes of gesture, but can not output no gesture found class. For online mode, the model should not only classify the gesture class, but also should process the frames without gesture(e.g. outputs 'no action found' class). Thanks.

breadbread1984 commented 5 years ago

you can feed data sequence of any length as long as you assign the input parameter 'sequence_lengths' in the input dictionary correctly.

asdfqwer2015 commented 5 years ago

En, thanks for your reply again.

But I'm still confused. Could you please help to see these?

a. For no_gesture_found class, i.e. negetive class I'm not sure, maybe the model need some training sample without gesture to learn the negative class(i.e. 26th class no gesture found)?

b. Should classes count for ctc == len(classes) or len(classes) + len(['blank for ctc loss'])?
And I tested trained model with training samples, these have high accuracy in classification. But it didn't output class label when processing 24th(0 based) class of samples. I doubt this may due to last class also occupied by ctc loss. So, should the classes for model's output equal to len(classes) + len(['blank for ctc loss'])? BTW, my tensorflow's version is 1.12.0.

breadbread1984 commented 5 years ago

for nv hand gesture dataset, every video clip contain one continuous occurring gesture definitely. so the label sequence contains only one class label (>=0). there is no label for no gesture.

the output length of ctc == how many continuous occurring gestures are found in a video clip. one label in the output sequence represents one gesture.

asdfqwer2015 commented 5 years ago

En, I'v understood. Thanks. But I met a new issue: overfitting. Could you please help to see? I'll create a new issue. :)

buaa-luzhi commented 4 years ago

@asdfqwer2015 Hello, can you explain why the 'no_gesture' class is not printed when is tested online? Thanks.

buaa-luzhi commented 4 years ago

@asdfqwer2015 @breadbread1984 Hello, output random numbers between 0 to 24 when using the ActionRecognition.py script for classification. Do you know why? Looking forward to your reply!!!!!!!!!!!