InseadDataAnalytics / INSEADAnalytics

Other
122 stars 1.31k forks source link

Model works on sample "Testing" but not on the Data to be predicted #146

Open AndreaSiri opened 6 years ago

AndreaSiri commented 6 years ago

Dear all,

When I run my predictions on my sample (I split db in Training and Test) I have no error, but when I do it on the actual database I have to predict it gives me the following error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor MSZoning has new levels FIXED_NA

Is it because when I "clean" the data on both databases, I end-up having a different # of surrogates in the training and the testing?

Thank you, Andrea

Antoine-Engerand commented 6 years ago

Could it be that your training+test did not contain a level in the factor MSZoning that the prediction set contains? You can apply 'str()' on both training & predictions sets to verify that.

carlocelis commented 6 years ago

I got this error too. Turns out that one variable value (e.g. Dog) was never part of my training and testing dataset, but part of my to-be-predicted csv file. So if you can, open your csv file that contains the observations to be predicted, and find a row which contain the variable FIXED_NA. If it is only one, I suggest assigning a value