Closed wilbry closed 5 years ago
Yes, we followed ICT and included labeled training data in the unlabeled set. The code for the data split of ICT is available here.
We also tried to exclude the labeled training data from the unlabeled set. The accuracy on CIFAR-10 with 4,000 examples is 94.66+-0.17, similar to the original performance 94.73+-0.11.
Thanks for the explanation!
Looking at
main
in https://github.com/google-research/uda/blob/master/image/preprocess.py, it seems that the supervised and unsupervised sets are being drawn from the same data independently, so there is likely to be overlap of images in the supervised and unsupervised sets. Do you think this affects your data in any way? Or am I misreading the code?