VarIr / scikit-hubness

A Python package for hubness analysis and high-dimensional data mining
BSD 3-Clause "New" or "Revised" License
44 stars 9 forks source link

Enhancement request: be able to do hubness analysis with different metrics #68

Open ivan-marroquin opened 3 years ago

ivan-marroquin commented 3 years ago

Hi,

From issue , I learned that the package should be able to conduct hubness analysis with several metrics (including fractional norms).

So, I tried to use a fractional norm with the following code:

from skhubness.data import load_dexter from skhubness import Hubness hub= Hubness(k= 10, return_value= 'all', metric= 'minkowski', algorithm= 'hnsw', algorithm_params= {'p': 0.1}, hubness= 'local_scaling', random_state= 1969, n_jobs= -1) hub.fit(X)

which gave the error below:

Traceback (most recent call last): File "", line 1, in File "C:\Users\IMarroquin\Downloads\Important_Python_Libraries_VisualBuildTools\scikit-hubness-master\skhubness\analysis\estimation.py", line 283, in fit raise ValueError(f"Unknown metric '{metric}'. " ValueError: Unknown metric 'minkowski'. Must be one of ['euclidean', 'cosine', 'precomputed'].

According to documentation of nmslib, this package is able to support several metrics (including fractional norms).

I think it will be beneficial to run hubness analysis with the choice of metric.

Thanks,

Ivan

ivan-marroquin commented 3 years ago

Here is the link to the issue I mentioned above https://github.com/VarIr/scikit-hubness/issues/67

VarIr commented 3 years ago

IIRC, nmslib's HNSW does not support any metric besides Eucl and cos, but please feel free to point me to documentation that states otherwise.

However, this code seems to fail on a check in skhubness that might not be necessary at this point. It would also fail for algorithm="brute" which it shouldn't... Would need to look into this in detail.

For a work-around, you could calculate fractional distances ahead of time, and use metric="precomputed".

ivan-marroquin commented 3 years ago

Hi @VarIr ,

Thanks for the prompt answer. With respect the documentation of nmslib on distances: https://github.com/nmslib/nmslib/blob/master/manual/spaces.md

I will try the proposed workaround.

Ivan

VarIr commented 3 years ago

Indeed, while optimized indices are only available for Eucl and cos, many more spaces are supported in general.

For personal reference, the detailed list on supported spaces is available in the manual, Table 1, p. 5.

ivan-marroquin commented 3 years ago

Thanks for sharing the document