AnotherSamWilson / miceforest

Multiple Imputation with LightGBM in Python
MIT License
353 stars 31 forks source link

impute_new_data don`t work #83

Closed mayii2001 closed 3 months ago

mayii2001 commented 1 year ago

I tried directly use impute_new_data and complete_data() after I runned the self.kds.mice on old data:

k=self.kds.impute_new_data(new_data=newdata,iterations=self.num_iterations,random_state=self.random_state,copy_data=True) temp=k.complete_data()

But the temp still had null values.

And I tried pipeline:

X_train=olddata[xcol] y_train=olddata.drop(xcol,axis=1) pipe_kernel = mf.ImputationKernel(X_train, datasets=1) pipe = Pipeline([ ('impute', pipe_kernel), ('scaler', StandardScaler()), ]) X_train_t = pipe.fit_transform( X_train, y_train, impute__iterations=self.num_iterations, impute__train_nonmissing=True )

Transform the test data as well

X_test = pipe.transform(newdata)

It reported IndexError: list index out of range

File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\sklearn\base.py", line 1151, in wrapper return fit_method(estimator, args, kwargs) File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\sklearn\pipeline.py", line 464, in fit_transform Xt = self._fit(X, y, fit_params_steps) File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\sklearn\pipeline.py", line 370, in _fit X, fitted_transformer = fit_transform_one_cached( File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\joblib\memory.py", line 353, in call return self.func(args, kwargs) File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\sklearn\pipeline.py", line 952, in _fit_transform_one res = transformer.fit(X, y, fit_params).transform(X) File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\miceforest\ImputationKernel.py", line 1219, in transform new_dat = self.impute_new_data(X, datasets=[0]) File "C:\Users\l\anaconda3\envs\ML\lib\site-packages\miceforest\ImputationKernel.py", line 1589, in impute_new_data name=f"ind {str(iter_pairs[0][1])}-{str(iter_pairs[-1][1])}", IndexError: list index out of range

I'm sure the number of columns is the same in both newdata and olddata. I can't figure out what the problem is here. I would be grateful if anyone could answer my question.

mayii2001 commented 1 year ago

Finally, I gave up on using self.kds and used a new instance of mf.ImputationKernel. It turned out to work. But I don`t know why the second bug appeared.

chennanhzc commented 7 months ago

Got the same issue...

AnotherSamWilson commented 3 months ago

This might have been caused by not resetting the index on the data that was being imputed. There are assertions in major version 6 to keep these bugs from happening. If it keeps happening, please reopen this issue.