Open isseay opened 1 year ago
great job! i have a question about the loss, all the annotations (reference frames and key frame) will be used for YOLOV? if that, i wanna only use information of references for key frame detection, Is it feasible?thx
Thanks for your attention. In the training phase, all proposals are the same. They are each other's frame of reference. But it's feasible to merely use information from references by changing the self-attention to cross-attention and only compute the loss of the keyframe.
great job! i have a question about the loss, all the annotations (reference frames and key frame) will be used for YOLOV? if that, i wanna only use information of references for key frame detection, Is it feasible?thx
Thanks for your attention. In the training phase, all proposals are the same. They are each other's frame of reference. But it's feasible to merely use information from references by changing the self-attention to cross-attention and only compute the loss of the keyframe.
I've thought about this, But I have some doubts about the implementation, could u give me some tips? thank you very much!!!
By generating the query part from the keyframe features, and the key & value part from reference features, you can finish the cross-attention. Then, by only computing the loss for keyframe features (add mask to others), the idea--" only use information of references for key frame detection" can be verified.
By generating the query part from the keyframe features, and the key & value part from reference features, you can finish the cross-attention. Then, by only computing the loss for keyframe features (add mask to others), the idea--" only use information of references for key frame detection" can be verified.
thanks a lot! i will have a try, best wishes for you~
By generating the query part from the keyframe features, and the key & value part from reference features, you can finish the cross-attention. Then, by only computing the loss for keyframe features (add mask to others), the idea--" only use information of references for key frame detection" can be verified.
i have a question, as u said--"changing the self-attention to cross-attention ", whether the information of the current frame will be lost? because the V in cross-attention only contains the information from the reference frames? if still use self-attention and only computing the loss for keyframe, Is it feasible?if not, where do you think the problem is? Really hope your reply and analysis, thx
great job! i have a question about the loss, all the annotations (reference frames and key frame) will be used for YOLOV? if that, i wanna only use information of references for key frame detection, Is it feasible?thx