Open jipson7 opened 3 years ago
Hi.
Why did you decide to use a BCE loss on the ITM pretraining text and a ranking loss on the ITM downstream task? Is there any intuition behind this? Why not use a ranking loss on both?
Thanks for the great work.
-Caleb
Hi.
Why did you decide to use a BCE loss on the ITM pretraining text and a ranking loss on the ITM downstream task? Is there any intuition behind this? Why not use a ranking loss on both?
Thanks for the great work.
-Caleb