Open darwin-a opened 4 years ago
Hi! Sorry for the late reply, I've been quite busy with few deadlines :) Thanks for opening the issue and I am glad that you like Distython. Wow, I am super glad that Scikit-Learn actually added a new imputation method that is finally not a mean/mode imputation!
Sounds like a good feature to add. I am currently out of time due to my undergrad dissertation and I won't be able to make a PR to the repo but maybe you would like to create one? If you struggle with a PR, let me know and I would be glad to help :)
I think that atm the problem (without modifying the package) can be circumvented as follows:
missing_values = [np.nan, 999] # Something random here
heom_metric = HEOM(cat_ix=[0,1], nan_equivalents=missing_values)
# missing_vals are not passsed to heom`
imputer = sklearn.impute.KNNImputer(missing_vals=missing_values, metrics=lambda x, y, missing_vals: heom_metric.heom(x, y))
I haven't checked if the code above works it just to give you a rough idea. I have used this package with lambda operators like that above and it worked for me :) By using a lambda operator to wrap the heom() function we can choose which parameters we can pass to the function and which we just leave unused.
Best, Kacper
Hi Kacper!
I'm working with a heterogeneous dataset and I was also surprised at the lack of heterogeneous distance variables! The data I was working with had a paper linked to using HEOM as their distance metric. I was about to implement it myself when I luckily stumbled upon your work!
While your algorithms works with algorithms such as NearestNeighbors it doesn't work with sklearn's new imputation feature: KNN Impute
I still consider myself new to Machine Learning, and this is my first time opening an issue on Github, but I implemented a small fix so it works with both the previous algorithms (actually I only tested it on NearestNeighbor implementation, but if it works there then it should work with other algorithms) and SKlearns new feature!
Issue:
If you are using a user-defined metric, KNNImpute needs a callable function that takes at least three inputs (instance one, instance two, missing_values)
Taken directly from KNNImpute
"callable : a user-defined function which conforms to the definition of _pairwise_callable(X, Y, metric, **kwds). The function accepts two arrays, X and Y, and a missing_values keyword in kwds and returns a scalar distance value."
Below is what I did to make it work.
Let me know what you think!
Thanks again for making such a great tool!