abstractqqq / polars_ds_extension

Polars extension for general data science use cases
MIT License
266 stars 18 forks source link

KNN to return distances ? #49

Closed remiadon closed 5 months ago

remiadon commented 5 months ago

First of all this package is awesome 👏

Feature

I'd like to know if we can consider the knn_ptwise function to return distances to the neighbors, instead of their indices. Sklearn's class provide a return_distance keyword, so I'd suggest the same convention to be used

NB : if there is an elegant way to do it via the expression framework, please forgive my question, but at least I expect other users to find the example useful

Context

I need this in order to compute entropy on continuous (potentially multivariate) data, as done here

abstractqqq commented 5 months ago

Hey, thank you for the issue. I have some follow up questions. 1. Do you need index and distance at the same time? Does it matter if the distances come in a sorted list or not? 2. Do you think it would be great if I provide an out-of-box expression for copula entropy?

I will be travelling this week and won't be able to work on this.

Thanks again for providing me with more improvement opportunities and making the package more useful

remiadon commented 5 months ago

@abstractqqq thanks for the quick turnaround. Ideally I would expect the distances to be sorted, and I guess they could be wrapped in a struct, along with indices.

I was intending to provide an implementation for the copula entropy function mentioned. Of course if you provide me with a function out of the box I will be more than happy ! I just thought other users would appreciate a clear view over what composes such complex "queries"

abstractqqq commented 5 months ago

https://github.com/abstractqqq/polars_ds_extension/pull/53

It is merged. Please refer to the latest example.ipynb for reference. Ruturning more data definitely takes a little more time but I think it is good for now