Closed TerranceFLab closed 3 years ago
Hey, thank you for your kind words.
We mean the factor hyperparameter of those losses was tuned, not actual learned parameters. So we simply took 20% of the train dataset T as validation V, and the remaining 80% as the temporary new train dataset X. For each loss, we retrain from scratch on X, then evaluate on V. The hyperaparameter that best performed on V are kept, and then the model is retrain from scratch on T for the final time.
Is that clearer? :)
Hi, thanks for your great job. I notice that the paper says that "all alternative losses were tuned on the validation set to get the best performance". Could you give more details about that? What is the proportion of validation set in the original training set? Do you retrain the model from scratch on the original training set after the parameter tuning on the validation set?
Thanks in advance.