VarIr / scikit-hubness

A Python package for hubness analysis and high-dimensional data mining
BSD 3-Clause "New" or "Revised" License
44 stars 9 forks source link

precomputed option for Hubness with hnsw not working #71

Closed ivan-marroquin closed 3 years ago

ivan-marroquin commented 3 years ago

Hi,

I would like to report the following issue with precomputed option when using hnsw algorithm.

First run:

import numpy as np
from skhubness import Hubness
from skhubness.neighbors import VALID_METRICS
from sklearn.metrics import pairwise_distances

X= np.random.randn(100,7)
dist= pairwise_distances(X, metric= 'minkowski', p= 0.1)
hub= Hubness(k= 15, return_value= 'all', metric= 'precomputed', algorithm= 'hnsw', hubness= None, random_state= 1969, n_jobs= 3)
hub.fit(dist)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Temp\Python\Python3.6.5\lib\site-packages\scikit_hubness-0.21.3-py3.6.egg\skhubness\analysis\estimation.py", line 325, in fit
    verbose=self.verbose,
  File "C:\Temp\Python\Python3.6.5\lib\site-packages\scikit_hubness-0.21.3-py3.6.egg\skhubness\neighbors\unsupervised.py", line 163, in __init__
    metric_params=metric_params, n_jobs=n_jobs, **kwargs)
  File "C:\Temp\Python\Python3.6.5\lib\site-packages\scikit_hubness-0.21.3-py3.6.egg\skhubness\neighbors\base.py", line 155, in __init__
    n_jobs=n_jobs)
  File "C:\Temp\Python\Python3.6.5\lib\site-packages\sklearn\neighbors\base.py", line 121, in __init__
    self._check_algorithm_metric()
  File "C:\Temp\Python\Python3.6.5\lib\site-packages\scikit_hubness-0.21.3-py3.6.egg\skhubness\neighbors\base.py", line 214, in _check_algorithm_metric
    raise ValueError(f"Metric '{self.metric}' not valid. Use "
ValueError: Metric 'precomputed' not valid. Use sorted(skhubness.neighbors.VALID_METRICS['hnsw']) to get valid options. Metric can also be a callable function.

Second run:

print(VALID_METRICS['hnsw']) 
['euclidean', 'l2', 'minkowski', 'squared_euclidean', 'sqeuclidean', 'cosine', 'cosinesimil']

hub= Hubness(k= 15, return_value= 'all', metric= 'minkowski', p= 0.1, algorithm= 'hnsw', hubness= None, random_state= 1969, n_jobs= 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'p'

Third run:

hub= Hubness(k= 15, return_value= 'all', metric= 'minkowski', algorithm= 'hnsw', algorithm_params= {'p': 0.1}, hubness= None, random_state= 1969, n_jobs= 3)
hub.fit(dist)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Temp\Python\Python3.6.5\lib\site-packages\scikit_hubness-0.21.3-py3.6.egg\skhubness\analysis\estimation.py", line 283, in fit
    raise ValueError(f"Unknown metric '{metric}'. "
ValueError: Unknown metric 'minkowski'. Must be one of ['euclidean', 'cosine', 'precomputed'].

Thanks for your help,

Ivan

VarIr commented 3 years ago

It seems the error messages are sometimes misleading. In any case, metric="precomputed" should be combined with algorithm="brute".

ivan-marroquin commented 3 years ago

Unfortunately, the help documentation does not mention that 'precomputed' should be used with algorithm set to 'brute' (https://scikit-hubness.readthedocs.io/en/latest/documentation/_autosummary/skhubness.analysis.Hubness.html)

VarIr commented 3 years ago

Starting from v0.30, Hubness will by default expect KNeighborsTransformer-compatible input, that is, a sparse k-neighbors graph as produced by recent versions of sklearn or the new wrappers in skhubness (e.g. the NMSlibTransformer).

This should effectively solve this issue.