Closed ClimbsRocks closed 8 years ago
if we have multiple cuts of a dataset (short, tiny, full, etc.), each of them is going to have their own y data. each time, we run feature selection, and each time different features do or do not make the cut. and then each time we overwrite the y data
i think this is as simple as adding the name of the training dataset to the y dataset. of course, this is all just a theory and may not actually be the source of the bug.
that appears to be the issue. it looks like the issue is with X_test, X_test_nn, and id_test, which all three just use the name of the testing data set (which is going to be shared across all the training data sets), rather than being specific to the features selected for this training data set.
that issue exists in writeToFile in https://github.com/ClimbsRocks/data-formatter
should be fixed!
it keeps throwing this error: Error: ValueError: Number of features of the model must match the input. Model n_features is 82 and input n_features is 86
my guess is we're pulling from datasets that were formatted at two different points in time (and had different features make or not make the cut)