Closed dagongji10 closed 5 years ago
This is adopted from the original SSAD code. They want to align features. They use 9
because part of their features are extracted by C3D, where videos are split into non-overlapped 16-frame clips and 9
is about in the middle of a 16-frame clip.
Though I didn't use C3D feature, I simply adopted this line and forgot to change. Someone try with len_df = frame_count
told me the performance will have little influence.
Yes I just repeat the last frame for short videos and since only several videos are shorter than window_size
the performance will not be influenced.
I adopt the same value for the parameter self.overlap_ratio_threshold
as the original SSAD code, which is 0.9
. I think too small (the coverage ratio of action instances is too small in the selected windows) or too large (the training data is not enough) value for this parameter will worsen the results. I didn't finetune this parameter but you can try to get a better result. The speed is depend on the size of data so you can count the size of window
to estimate speed.
@HYPJUDY I chang the code with len_df = frame_count - 5
, because in my dataset the action is short(some just longer a little than window_size) and extract_feature need optical_flow_frames=5
.
I trained decouple-ssad with NTU-RGBD dataset use window_size=64, thanks for your help and the result looks good.
But I still have a problem, extract_feature using TSN pretrained model is really slow, because it need optical_flow and the dense_flow is slow.If I can set the param optical_flow_frames
(in TSN, it is 5) with other value, maybe I don't need to calculate every frame's optiacal_flow. Have you try C3D or other model for feature-extracting before? Is there other way to extract action feature without TSN?
Glad to help and hear your suscessful try in other dataset : ) I didn't try other feature extraction methods because I read from papers that 3D extraction and two-stream extraction have similar performance. But I didn't compare their speed, maybe you can have a try. By the way, you can speed up optical flow extraction with more gpus and more processors in a gpu. Or you can sample videos with a bigger step (bigger interval between two frames) though the performance is expected to be worse to some extent. Luckily, you only need to do feature extraction once and the following experiments would be very fast.
In
gen_data_info.py
: (1) line 63, what doeslen_df = frame_count - 9
mean? How do you determine parameter 9 ? (2)line 73-74,when the frame_num < window_size, it will still put window_start=0 to window_info, when extract feature it cann't find the last few frames (window_size - frame_num), maybe I should change this for my dataset?
In
config.py
: (1)line 55,self.overlap_ratio_threshold = 0.9
, it filters out Windows that don't overlap very well, if I change this parameter smaller, what will happen? It will reduce accuracy or speed?