dataquestio / project-walkthroughs

Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .
900 stars 1.08k forks source link

Getting error ValueError: Input X contains NaN. SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values #4

Open GlamarSK opened 1 year ago

GlamarSK commented 1 year ago

At line model.fit(train[predictors], train["Target"]) ValueError: Input X contains NaN. SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Then i try to do the following, but not able to resolve this,

Create our imputer to replace missing values with the mean e.g.

imp = SimpleImputer(missing_values=0, strategy='mean') imp = imp.fit(train)

Impute our data, then train

X_train_imp = imp.transform(train)

model.fit(X_train_imp[predictors], X_train_imp["Target"])

Please share the solution

meetttttt commented 1 year ago

I had the same issue, So I removed the null values and it worked, PS: I know it's not best practice, but it worked, one solution is this, or we can also replace Nan using either mean or median.

saidavanam commented 6 months ago

I had the same issue. I used imp = SimpleImputer(missing_values=np.nan, strategy='mean') and it worked

Zhang-YaLiang commented 2 months ago

you should not set missing_values=0 or something else just imp = SimpleImputer(strategy='mean') may be ok