[Question] About model training and selection

STHxiao commented 1 week ago

Thank you for your outstanding work; it has been immensely helpful to me. I have some questions regarding model training and selection, and I hope you can provide some clarity.

What is the criterion for selecting the best model? In the upernet_convnexttiny_ade20k.log, I noticed that 130 epochs were trained, with the highest mIoU achieved at epoch 123, yet the released weights correspond to epoch 128.
What is the correct process for model training and selection? Some courses mention dividing the dataset into training, validation, and test sets, using the validation set for hyperparameter tuning to identify the most suitable model, and then conducting final testing on the test set to report accuracy. If there is no validation set and tuning is performed directly on the test set, what is the purpose of test.py? This seems to overlap with the validation phase during training and may lead to overfitting, making it difficult to assess the model's generalization capability. This has puzzled me for some time, and the descriptions in some papers are rather vague, possibly indicating an industry-standard process. If you could recommend relevant papers, tutorials, or reference code, I would greatly appreciate it.

I look forward to your response. Thank you once again, and I wish you all the best.

CharlesPikachu commented 6 days ago

Hi, strictly speaking, we should use the last epoch as the final training result of the model.

I will update the reported results in sssegmentation.

STHxiao commented 6 days ago

Thanks! What about question2? I am really very confused. Looking forward to your guidance and suggestions. Thank you once again!

SegmentationBLWX / sssegmentation

[Question] About model training and selection #60