Is there a way we can include the test set in the training data?
Steve, you mentioned that it had a suprisingly good outcome on AUC (0.78 vs 0.76) on inclusion vs exclusion of test sample in the training data.
My thoughts are that because 50% of the test data is valid, it would provide really good data for training. How can we extract this? For instance, if we can somehow include the training data in an estimation process with apriori "classification" of P(Y=1)=0.5, and then iteratively update this classification based on the outcome of training and testing using this data, perhaps we can converge on a good solution ala pagerank style?
Is there a way we can include the test set in the training data?
Steve, you mentioned that it had a suprisingly good outcome on AUC (0.78 vs 0.76) on inclusion vs exclusion of test sample in the training data.
My thoughts are that because 50% of the test data is valid, it would provide really good data for training. How can we extract this? For instance, if we can somehow include the training data in an estimation process with apriori "classification" of P(Y=1)=0.5, and then iteratively update this classification based on the outcome of training and testing using this data, perhaps we can converge on a good solution ala pagerank style?