I noticed the comments in the code, which state that the reported results are from the last epoch and that no dev set is used to find the best epoch. I also found other repositories that don't have a dev set split. I wonder if it's better to have a dev set to tune hyperparameters and find the best checkpoint, or if a dev set is unnecessary for distillation.
I noticed the comments in the code, which state that the reported results are from the last epoch and that no dev set is used to find the best epoch. I also found other repositories that don't have a dev set split. I wonder if it's better to have a dev set to tune hyperparameters and find the best checkpoint, or if a dev set is unnecessary for distillation.