alphabetsoup / smla1

Statistical Machine Learning Assignment 1
0 stars 0 forks source link

Effect of including test set in training data #11

Open alphabetsoup opened 9 years ago

alphabetsoup commented 9 years ago

Is there a way we can include the test set in the training data?

Steve, you mentioned that it had a suprisingly good outcome on AUC (0.78 vs 0.76) on inclusion vs exclusion of test sample in the training data.

My thoughts are that because 50% of the test data is valid, it would provide really good data for training. How can we extract this? For instance, if we can somehow include the training data in an estimation process with apriori "classification" of P(Y=1)=0.5, and then iteratively update this classification based on the outcome of training and testing using this data, perhaps we can converge on a good solution ala pagerank style?