eltonlaw / impyute

Data imputations library to preprocess datasets with missing data
http://impyute.readthedocs.io/
MIT License
352 stars 49 forks source link

fast_knn, moving_window and locf are returning data without imputation for univariate time series #54

Closed kumarh22 closed 5 years ago

kumarh22 commented 5 years ago

Data looks as below

tsNH4_na.head()

index ds y
2010-11-30 16:10:00 2010-11-30 16:10:00 13.714667
2010-11-30 16:20:00 2010-11-30 16:20:00 NaN
2010-11-30 16:30:00 2010-11-30 16:30:00 14.630500
2010-11-30 16:40:00 2010-11-30 16:40:00 16.385333
2010-11-30 16:50:00 2010-11-30 16:50:00 15.992667

Including ds is giving error BadInputError: Data is not float. So just tried with single variable y

np.isnan(impy.imputation.ts.moving_window(np.array(tsNH4_na[["y"]]),func = np.mean,errors='raise',nindex=0,wsize=10)).sum()

833

impy.fast_knn(tsNH4_na[['y']],k = 2) np.isnan(imput).sum()

833

Unimputed data also have 833 missing points.

eltonlaw commented 5 years ago

That is expected behaviour, moving_window works on 2D data. Its use case would be if you had multiple iterations of the y column and wanted to interpolate one instance of it with all the other (complete ones). 1D data isn't a use case I had originally considered, I'll open new issues for that referencing this.