Open asdfqwer2015 opened 5 years ago
you should maintain an buffer of the time sequence and feed it into the model. the buffer serves as a fifo of the online frames.
Thanks for your quick reply.
I might haven't described the issue clearly.
En, for buffer mechanism, I've noticed the buffer mechanism in your ActionRecognition.py
script before. And I just modified the script with video source(webcam => video file). It still process the video by fixed length buffer(80 frames) fifo.
The issue is how to process the frames without gesture in unsegmented input video. I found the model can only output 25 classes of gesture, but can not output no gesture found
class. For online mode, the model should not only classify the gesture class, but also should process the frames without gesture(e.g. outputs 'no action found' class).
Thanks.
you can feed data sequence of any length as long as you assign the input parameter 'sequence_lengths' in the input dictionary correctly.
En, thanks for your reply again.
But I'm still confused. Could you please help to see these?
a. For no_gesture_found
class, i.e. negetive class
I'm not sure, maybe the model need some training sample without gesture to learn the negative class(i.e. 26th class no gesture found
)?
b. Should classes count for ctc == len(classes) or len(classes) + len(['blank for ctc loss'])?
And I tested trained model with training samples, these have high accuracy in classification. But it didn't output class label when processing 24th(0 based) class of samples. I doubt this may due to last class also occupied by ctc loss. So, should the classes for model's output equal to len(classes) + len(['blank for ctc loss'])?
BTW, my tensorflow's version is 1.12.0.
for nv hand gesture dataset, every video clip contain one continuous occurring gesture definitely. so the label sequence contains only one class label (>=0). there is no label for no gesture.
the output length of ctc == how many continuous occurring gestures are found in a video clip. one label in the output sequence represents one gesture.
En, I'v understood. Thanks. But I met a new issue: overfitting. Could you please help to see? I'll create a new issue. :)
@asdfqwer2015 Hello, can you explain why the 'no_gesture' class is not printed when is tested online? Thanks.
@asdfqwer2015 @breadbread1984 Hello, output random numbers between 0 to 24 when using the ActionRecognition.py script for classification. Do you know why? Looking forward to your reply!!!!!!!!!!!
Hi, in ActionRecognition.py, I tested some videos from nvgesture dataset(untrimmed video). And it outputs class no. in 0~24 in each frames. i.e. it didn't outputs any
blank
orno_action
label. If processing untrimmed video is online detection refer to trimmed video as offline detection. How to do online detection? Did I miss something? Thanks.