Can not reproduce the number reported in the paper.

KaiyangZhou / ssdg-benchmark

Benchmarks for semi-supervised domain generalization.

MIT License

67 stars 9 forks source link

Can not reproduce the number reported in the paper. #7

Closed qianlanwyd closed 2 years ago

qianlanwyd commented 2 years ago

Hi, I ran fixmatch about five times for PACS with 210 labels. When the target domain is A, I only got 77.07% accuracy, which is 1% lower than the result in the paper.

KaiyangZhou commented 2 years ago

more details on how you ran the code?

qianlanwyd commented 2 years ago

Thank you for your reply. I just ran your script. Besides, may I ask why you do not evaluate every epoch to choose the best model for test?

KaiyangZhou commented 2 years ago

Updates:

hi, just to let you know that I ran fixmatch on PACS's A domain (the 5 splits) and got 78.23 +- 1.90, so the code is fine

small differences in the numbers are pretty normal on PACS

the results on OH would be more stable as the stds are much smaller

still, I'd suggest you run the full experiment

I'm closing this issue for now and if you find anything wrong with the code, reply in this thread