fix issue with --alreadyFormatted

ClimbsRocks / machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

https://github.com/ClimbsRocks/auto_ml

408 stars 62 forks source link

fix issue with --alreadyFormatted #103

Closed ClimbsRocks closed 8 years ago

ClimbsRocks commented 8 years ago

it keeps throwing this error: Error: ValueError: Number of features of the model must match the input. Model n_features is 82 and input n_features is 86

my guess is we're pulling from datasets that were formatted at two different points in time (and had different features make or not make the cut)

ClimbsRocks commented 8 years ago

if we have multiple cuts of a dataset (short, tiny, full, etc.), each of them is going to have their own y data. each time, we run feature selection, and each time different features do or do not make the cut. and then each time we overwrite the y data

i think this is as simple as adding the name of the training dataset to the y dataset. of course, this is all just a theory and may not actually be the source of the bug.

ClimbsRocks commented 8 years ago

that appears to be the issue. it looks like the issue is with X_test, X_test_nn, and id_test, which all three just use the name of the testing data set (which is going to be shared across all the training data sets), rather than being specific to the features selected for this training data set.

that issue exists in writeToFile in https://github.com/ClimbsRocks/data-formatter

ClimbsRocks commented 8 years ago

should be fixed!