YuHengsss / YOLOV

This repo is an implementation of PyTorch version YOLOV Series
Apache License 2.0
283 stars 39 forks source link

per frame annotations will be used in YOLOV? #11

Open isseay opened 1 year ago

isseay commented 1 year ago

great job! i have a question about the loss, all the annotations (reference frames and key frame) will be used for YOLOV? if that, i wanna only use information of references for key frame detection, Is it feasible?thx

YuHengsss commented 1 year ago

great job! i have a question about the loss, all the annotations (reference frames and key frame) will be used for YOLOV? if that, i wanna only use information of references for key frame detection, Is it feasible?thx

Thanks for your attention. In the training phase, all proposals are the same. They are each other's frame of reference. But it's feasible to merely use information from references by changing the self-attention to cross-attention and only compute the loss of the keyframe.

isseay commented 1 year ago

great job! i have a question about the loss, all the annotations (reference frames and key frame) will be used for YOLOV? if that, i wanna only use information of references for key frame detection, Is it feasible?thx

Thanks for your attention. In the training phase, all proposals are the same. They are each other's frame of reference. But it's feasible to merely use information from references by changing the self-attention to cross-attention and only compute the loss of the keyframe.

I've thought about this, But I have some doubts about the implementation, could u give me some tips? thank you very much!!!

YuHengsss commented 1 year ago

By generating the query part from the keyframe features, and the key & value part from reference features, you can finish the cross-attention. Then, by only computing the loss for keyframe features (add mask to others), the idea--" only use information of references for key frame detection" can be verified.

isseay commented 1 year ago

By generating the query part from the keyframe features, and the key & value part from reference features, you can finish the cross-attention. Then, by only computing the loss for keyframe features (add mask to others), the idea--" only use information of references for key frame detection" can be verified.

thanks a lot! i will have a try, best wishes for you~

isseay commented 1 year ago

By generating the query part from the keyframe features, and the key & value part from reference features, you can finish the cross-attention. Then, by only computing the loss for keyframe features (add mask to others), the idea--" only use information of references for key frame detection" can be verified.

i have a question, as u said--"changing the self-attention to cross-attention ", whether the information of the current frame will be lost? because the V in cross-attention only contains the information from the reference frames? if still use self-attention and only computing the loss for keyframe, Is it feasible?if not, where do you think the problem is? Really hope your reply and analysis, thx