I wonder which model did you use for the evaluation in the paper, Model 1 or Model 2 or some kind of ensembling (for example prediction3 in function test_ruc in lib/protocols.py)?
And how did you select the best model? By selecting the best accuracy of prediction3 on training or validation set?
As I remember, I use ensembling results for evaluation.
Because validation set is not possible for unsupervised learning setting, previous papers report both the last and best accuracy for comparison.
Hi, thank you again for sharing the code!
I wonder which model did you use for the evaluation in the paper, Model 1 or Model 2 or some kind of ensembling (for example prediction3 in function test_ruc in lib/protocols.py)? And how did you select the best model? By selecting the best accuracy of prediction3 on training or validation set?
Thank you so much!