Closed Punchwes closed 3 years ago
Hi, sorry for the late reply. I think we used QQP_dev for early stopping. If I remember correctly, we also tried using PAWS_dev to pick the best checkpoint. That would be a bit overfitting to PAWS_dev but it didn't make a big difference. QQP model was pretty bad on PAWS_dev regardless what dev set you use to pick the best checkpoint. QQP+PAWS_train converged to the best performance on both QQP_dev and PAWS_dev.
Hi, given PAWS_QQP does not have a separate dev or test set, in your original training strategy, I wonder how do you decide the model when the scenario is: QQP -> PAWS and QQP+PAWS_train -> PAWS. Do you still use the QQP_dev for things like early stopping, or you directly evaluate models on PAWS along training and pick the maximum acc/auc number?
Many thanks.