About the differences in results between tables 1-4

sanqingqu commented 3 years ago

Thanks for your great contribution to the SFDA task. I am really impressed by the method in your G-SFDA paper. However, we found that there are some differences in results between tables 1-4. Specifically, for the VisDA-C dataset, in table 1, the avg result is 85.4, but under the same condition, the avg result in table 3 is 85.0 ...... Also, this difference is the same as the OfficeHome dataset, in table 2, the avg result is 71.3, but under the same condition, the avg result in table 4 is 70.8 ...... Could you explain the reasons for these differences?

TychusLee commented 3 years ago

They re-divided the dataset.

sanqingqu commented 3 years ago

Thanks for your reply. Based on the released code for model adaptation (w/ domain ID), indeed, they re-divided the source-domain dataset with random sampling. However, I did not find that they re-divided the target-domain dataset. And during the domain adaptation, the source dataset is not introduced, but only used for the best checkpoint selection. Therefore, the final result (w/ domain ID) should be the same. It is not reasonable to have differences in results.

Albert0147 commented 3 years ago

Hi, thanks for your interest.

taking example on VisDA, splitting source into 90/10 means we only pretrain model on the 90% source data. SFDA is sensitive to the source pretrain (good initialization), it is quite reasonable to have the different result with different number of source training sample (especially for Office-Home which has less images than VisDA). One more thing here, as SHOT already shows in their ablation study, the label smoothing (both SHOT and we use it) in source training can have perforcemance gain, implying the quality of source model (discriminative features and good generalition) is important for SFDA. If you regard the SFDA as two stage deep clustering, you will quickly get it: the whole souce pretraining in SFDA just acts like the self supervised learning stage in deep clustering aiming to get a good initialization, with different pretrained model you will get different final results.
Like the paper title, we are source free and we do not have access to source data. Let me say that DA datasets do not provide validation/test splitting for target domain, and usually people will not even use the source accuracy to decide the best target performance under normal DA since they are not really correlated. Our method is quite stable during training, and our results in paper are the average with three different random seeds in the last epoch (only for target training part but the same for source data splitting).

Results in Table1 and 2 are using all source data for training!! while it is not the case for remaining tables.

sanqingqu commented 3 years ago

Thanks for your quick reply! The overall explanation about the SFDA task is convincing, and based on my own experiments, indeed, I found that the initialization model has a very strong impact on the final results, and the label smoothing strategy is a key step to generate pre-trained source model. However, the above explanation is still not convincing as to why your report has different results between tables 1 to 4. For example, for the VisDA-C dataset, in table 1 and table 3, under the same condition (w/ domain ID) why do we obtain different results?

Albert0147 commented 3 years ago

Hi , did you note that the results in Table1 and 2 are using all source data for training!! while it is not the case for remaining tables (we split the source domain only then). :)

sanqingqu commented 3 years ago

Thank you for your reply. So, what you are saying is that for the traditional SFDA task, your pre-trained source model is trained based on the entire source data, while for the generalized SFDA task, your pre-trained source model is based on a subset of the source data. In fact, this training recipe is may cause unfair comparison with SHOT, since SHOT's source model is trained with a subset of the source data (the same as the table 3 and 4 training recipe.) 😂

Albert0147 commented 3 years ago

Yeah that is the case.

And I return to the real final version of the SHOT (they have plenty of different arxiv/code versions and final results......), thanks for pointing out, indeed SHOT only trains on 90% (they split 90/10 on office and VisDA). I may update the it in arxiv with the comment or the reproduced with the same condition, while I think the performance gap still exists especially on VisDA.

sanqingqu commented 3 years ago

Thanks a lot~

Albert0147 / G-SFDA

About the differences in results between tables 1-4 #4