eltonlaw / impyute

Data imputations library to preprocess datasets with missing data
http://impyute.readthedocs.io/
MIT License
352 stars 49 forks source link

There is a bug in EM code #100

Closed ahmedhshahin closed 2 years ago

ahmedhshahin commented 2 years ago

Hello,

Thanks for the repo. I believe that there is a bug in the current implementation of EM, this line in particular.

It calculates the relative difference between the previous and current predictions. However, it should calculate the absolute difference, ie delta = np.abs(col[x_i]-previous)/previous

The current implementation will just break if the current is less than the previous value, as this will result in a negative value which is clearly less than 10%.

For example, if: col[x_i] = 1 previous = 1000 delta = -0.999 # delta < 0.1 will return True hence it will assume convergence despite the huge difference

Best regards