Closed VarIr closed 3 years ago
Added annoy in dev branch #21
Added puffinn in dev branch #30
Hi @VarIr
Many thanks for such great package!
I was wondering if you can include a wrapper for pyNNDescent at https://pynndescent.readthedocs.io/en/latest/api.html
Ivan
Glad to hear you find the package useful.
Do you have specific reasons to use NNdescent? Last time I checked it seemed to provide inferior results compared to the graph-based methods like HNSW and ONNG.
Hi @VarIr
Thanks for asking for my opinion.
I think that NNdescent offers something than other solutions do not have: the support of several metrics including custom metrics. This is aspect is very important for me. My data sets are in the form of n x d (where n >>> d, and d varies [7,12]). As you can see, I like to have a solution that allows me to test euclidean, manhattan, fractional or custom metrics to see which one will help me to deal with large amount of data in relative high dimensional space. And now, I started learning more about hubness and its impact. This is where your package will play an interesting role in the analysis that I conduct.
Ivan
Thanks for your input. Including support for custom dissimilarity measures seems worthwhile. Currently, most (all?) of the implemented methods at least support Euclidean and Manhattan distances.
I'll add nndescent to the list here. Unfortunately, I cannot give ETA, as I can work on this project only in spare time.
Thanks for considering pynndescent. In addition to support custom metrics, it also supports much more metrics than annoy package. For instance, the metric Minkowski with 0<p<1 seems to provide better results than Euclidean distance when working with high dimensional data
Yes, that's why skhubness allows fractional norms while sklearn doesn't. :)
Cool! I just need to be able to install this great package on my windows workstation (see incident #76 at https://github.com/VarIr/scikit-hubness/issues/67)
I tried to use a fractional norm with the following code:
from skhubness.data import load_dexter from skhubness import Hubness hub= Hubness(k= 10, return_value= 'all', metric= 'minkowski', algorithm= 'hnsw', algorithm_params= {'p': 0.1}, hubness= 'local_scaling', random_state= 1969, n_jobs= -1) hub.fit(X)
which gave the error below:
Traceback (most recent call last):
File "
For hubness analysis, the package only supports three metrics
The next version v0.30 will see compatibility with sklearn's KNeighborsTransformer. Since PyNNDescent ships with its own wrapper to act as a KNeighborsTransformer, there is no need to roll an additional implementation of that. We can, thus, consider PyNNDescent supported.
Closing, as the original list of wrappers has been dealt with. Users with requests for any other additional approximate neighbor wrappers, please open a new issue for each algorithm separately. This helps me to keep overview of open tasks. Thank you.
It would be nice to also have wrappers for
PyNNdescent: https://github.com/lmcinnes/pynndescent (for custom metric support)(wrappers already exist in thepynndescent
packageUPDATE: For new requests on additional ANN methods, individual Issues should be opened.