Closed tqosu closed 2 years ago
Hi Dipika,
Now I have finished experiments for 50Salads. Unfortunately, I still failed to reproduce your results.
I have uploaded my results to this shared folder (checkpoints and code).
https://drive.google.com/drive/u/0/folders/1CPqdGUvy2vifAND8cAdC8fzNT8lKyWkc
The script is the latest one and also the updated unsupervised checkpoint was used. Here are my results summary:
50salads 0.05 49.0 & 44.1 & 32.5 & 42.5 & 57.8 0.1 58.1 & 54.1 & 41.0 & 51.5 & 65.6 0.4 70.2 & 67.6 & 57.1 & 62.4 & 74.8
results in the paper 5% 52.9 49.0 36.6 45.6 61.3 10% 67.3 64.9 49.2 56.9 68.6
And the results were generated by the evaluation script, which took 5 splits and computed the average performance.
Hi Tieqiao,
I have uploaded new semi-supervised training set selections of 5%, 10% data of both 50salads and GTEA. If you use the new selections of semi-supervised training set with updated codes, you will find that the results will closely match or is mostly higher than what is reported in the paper. I have uploaded all the checkpoints of all the iterations of ICCs and of all the splits of 50salads and GTEA, by the running with the same semi-supervised splits and code as provided in the drive. You can use them too to validate the results.
50salads and GTEA are very small datasets. Even small variation in semi-supervised training set selection makes the results variance to be very high. We do report in the supplementary that semi-supervised set selection causes variations in results. If you check other standard semi-supervised paper and try to replicate, you will find the same issue. Additionally, selection of 5% and 10% training data from entire set of training set is possible in many ways. Since it is practically impossible to try all possible combination of selections, so in our paper we report mean and variance of 5 different random selections of semi-supervised training set.
Thanks, Dipika
Hi Dipika,
I am able to reproduce results for 5% and 10%.
But for 40% and after several tries, the best accuracy I got is 75.3 (78.4 is reported in the paper).
Do you have any suggestions on how to fix this?