Open pierowu opened 1 year ago
Thanks for the careful study.
When selecting the hyper-parameters, we sweep and validate on validation dataset:
Once the best hyper-parameter is selected, and report the best results on the test set, the line of code indicates that the best weights /epoch are selected by looking at the best numbers on test set.
For the real applications where we are not allowed to leverage the labels in the test set to determine the training epochs, we recommend to fix the numbers of epochs in the hyper-parameter search stage.
Thanks for the careful study.
When selecting the hyper-parameters, we sweep and validate on validation dataset:
Once the best hyper-parameter is selected, and report the best results on the test set, the line of code indicates that the best weights /epoch are selected by looking at the best numbers on test set.
For the real applications where we are not allowed to leverage the labels in the test set to determine the training epochs, we recommend to fix the numbers of epochs in the hyper-parameter search stage.
Thank you for your reply. Would this way cause overfitting in the test set? For example, we can design a model with high variance which can perform well in serveral epochs in the test set. However, this model will deteriorate in other epochs. When I want to compare with other methods in elevater benchmark, how can I make the comparison fair?
Thank you for your solid work. Does the repo implement the function that pick the model weights that perform best in val dataset to evaluate in test dataset? From the code below, it seems that the repo directly choose the best results in test dataset as the final results? https://github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC/blob/00d0af78559d5f3d800ae4668210e6bd1f2f84b9/vision_benchmark/evaluation/full_model_finetune.py#L267-L277