ColumbiaDVMM / CDC

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
68 stars 18 forks source link

Some question about the training multi-label frames. #20

Closed cccorn closed 5 years ago

cccorn commented 6 years ago

First, very thank you for sharing the code. But i still have some questions about the paper and code: 1. I found that some frames in THUMOS14 validation set have multi label (e.g. CliffDiving and Diving or CricketBowling and CricketShot). And i found the #2 has the same question, and you said you simply treat the the frames belong to diving but not cliffdiving as diving. But how did you treat the CricketBowling and CricketShot? 2. In your paper, the formula (3), you said the _zn stands for the ground truth class label for the n-th segment. Why is the label not frame-wise but segment-wise? Is it should be _zn(t) ? 3. In your paper, section 3.4 training data construction, you said only keep windows that have at least one frame belonging to actions. Do the actions class contain the Ambiguous? 4. In the code for evaluation, THUMOS14/eval/PreFrameLabeling/compute_framelevel_mAP.m, line 19-20:

% remove ambiguous
prob=prob(label_test(:,22)==0,:);
label_test=label_test(label_test(:,22)==0,:);

But I found the variable label_test that from the file multi-label-test.mat, is all zeros in the dimension 22, e.g.max(label_test(:,22))=0, So the code line 19-20 will do nothing. I think the ground truth label you provided should be wrong, exactly their are some ambiguous frames in the test video need to remove.