too good 98% correct clustering results

aparfenov2 commented 2 years ago

Hi guys, thank you for this good framework. I tried clustering output of VERI-Wild Baseline, and got 98% matches on test set (test_3000.txt) from Veri-wild dataset. This looks too perfect. Can it be that you used that test set for cross-validation so the model over-fitted on that data? That could explain that high score. What data for cross-validation did you use here ?

L1aoXingyu commented 2 years ago

@JinkaiZheng Could you please answer this question?

JinkaiZheng commented 2 years ago

@kantengri Hi~ Thank you for your attention！ First, we only use 10-fold cross-validation on VehicleID dataset, because the VehicleID dose NOT provide official division documents about query and gallery. We random sample query and gallery in our implementation every inference time. In order to avoid randomness of the results, a trained model on VehicleID was tested 10 times and averaged, which is named "10-fold cross-validation". Secend, we does NOT use the test set during training. And we use the official division documents about query and gallery on VERI-Wild, do NOT adopt "10-fold cross-validation" on VERI-Wild, which means we only tset one time to get the results. Third, we got 96.4% (Rank-1) in MODEL_ZOO on small VERI-Wild test set (test_3000.txt). I think the result is high because the training set stays the same and the test set gets small. To take an extreme example, if your test set has only one ID, then when your first search result is correct, your result (Rank-1) is 100%. The larger the amount of data, the greater the uncontrollable factors, the poorer performance tends to be, and vice versa. Fourth, I can not understand what the "clustering" mean in your question. Could you please give me more details?

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

JDAI-CV / fast-reid

too good 98% correct clustering results #607