PeihaoChen / RSPNet

Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)
36 stars 9 forks source link

framework image #5

Closed youwantsy closed 2 years ago

youwantsy commented 2 years ago

hello, thank you for your great work. it's so smart idea!

can you explain about framework image? i understand about RSP task, A-VID task is learned in 1 iteration. i think that it means 'anchor is same'. and i saw the algorithm, just sampling K clips in video V\v+, however, in paper fig 2. two clips in video, 1x clip and 2x clip 's features(green color) are going to g_a header and do contrastive learning. i think about you want to show us randomly selected speed.... is right? in real experiment, just c_i, c_j, {c_n}(K) clips in there? not 2K?

thank you

PeihaoChen commented 2 years ago

Hi, I am sorry for the confusion.

For the RSP task, we only need to sample three clips from the same video to calculate the triplet loss (Eq 2 in the paper). For the A-VID task, as shown in Line 10-11 in the algorithm, we only need c_i, c_j, {c_n}(K) clips for calculate the contrastive loss (Eq 3 in the paper). The c_i, c_j, {c_n}(K) clips for A-VID cound be sampled in random speed.

youwantsy commented 2 years ago

Ah ha! Thank you for your explain! I totally understand.