ColumbiaDVMM / CDC

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
68 stars 18 forks source link

How did 有 #2

Closed bityangke closed 7 years ago

bityangke commented 7 years ago

First, thank you very much for sharing your work. I still have a question about the training data. How did you process the multi-label frames in the training data? e.g. , for CliffDiving, almost all the frames also belong to Diving, when assign one-hot labels for these frames, you assign them [0,0,0,0,0,1,0,0,1,0,......] or just make two copies of the frames and assign them [0,0,0,0,0,1,0,0,0,......] and [0,0,0,0,0,0,0,0,1,0,......] respectively.

zhengshou commented 7 years ago

Hi,

The ground truth data used during testing for evaluation is multi-label of 21 classes.

But during training, we simply use one-hot label and only treat frames (that belongs to diving but not belongs to cliffdiving) as diving frames. During prediction, all frames predicted as cliffdiving will also be set as diving to form multi-label prediction.