Open EagleW opened 2 years ago
Hi, @dandelin
I have some questions about ITM pre-training. For the pretraining ITM, how did you use itm loss and wpa loss? It seems that you use them separately: https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/vilt_utils.py#L127-L139
Why not simply add up those two losses and backpropagate them together? https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py#L252-L272
I also have the same question as #48
Thank you!
Hi, @dandelin
I have some questions about ITM pre-training. For the pretraining ITM, how did you use itm loss and wpa loss? It seems that you use them separately: https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/vilt_utils.py#L127-L139
Why not simply add up those two losses and backpropagate them together? https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py#L252-L272
I also have the same question as #48
Thank you!