ermongroup / ssdkl

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
74 stars 29 forks source link

Transductive setting in ModelTrainer #4

Open atakanfilgoz opened 4 years ago

atakanfilgoz commented 4 years ago

Hello, I have a question about the transductive setting inside the ModelTrainer class. Please have a look at the following file to understand what I am saying. ssdkl/ssdkl/models/train_models.py Line 199-201

When we set the transductive variable to True, it executes these lines of code I mentioned above. Assume that we have a training data which has 100000 samples. (only 5000 of them are labeled and we will use 10% of them as validation set) Therefore we will have 4500 samples for training, 500 samples for validation and 95000 samples for test and unlabeled set(same for transductive learning). However, when the code I mentioned above executed we lose our unlabeled set. We have 5000 unlabeled instead. Or did I miss any thing? Could you please check these lines of code?

Thank you so much!

sangmichaelxie commented 3 years ago

You're right that in principle we could use all the unlabeled data we have. However for our purposes we look at the transductive setting where unlabeled data = test data. This is our definition of the transductive setting, which comes from [19] Cohen et al 2007 as cited in the paper. The test set stays the same for comparison to other settings.