Junction4Nako / mvp_pytorch

pytorch implementation of mvp: a multi-stage vision-language pre-training framework
MIT License
33 stars 8 forks source link

help: what's the meaning of "r_loss, f_loss, pseudo_labels, wra_loss" in "run_retrieval.py"? #3

Open Ammexm opened 2 years ago

Ammexm commented 2 years ago

image Hello~ I'm a little confused about this the meaning of "r_loss, F_Loss, pseudo_labels, wra_loss" in line 626 of "run_retrieval.py" file in this project. Could you please help me what they stand for respectively? Thx. very much~

Junction4Nako commented 2 years ago

args.use_phrase is a trial argument in my experiment to see if WPG can improve the fine-tuning of image-text retrieval, which gives a negative result. I think you should not use it. r_loss is the VSC loss similar to ALBEF and CLIP, which is applied on the uni-modal global embeddings; f_loss is the ITM loss, which is applied on the multi-modal outputs; pseudo_labels is the labels of sampled positive and negative pairs use in ITM, where those hard-negative pairs are sampled from the similarity distribution in VSC. wra_loss is the WPG loss, the same as in the pre-training