VarIr / scikit-hubness

A Python package for hubness analysis and high-dimensional data mining
BSD 3-Clause "New" or "Revised" License
44 stars 9 forks source link

question about the interpretation of hubness measurements #108

Closed ivan-marroquin closed 2 years ago

ivan-marroquin commented 2 years ago

Hi,

Many thanks for such interesting package!

I have a question on how to interpret the measures for hubness. Are these measures bounded (like say [0, 1])? Do you have references that describe how to interpret them?

For example, I ran an analysis using my data set and computed some hubness measures for different number of neighbors (the script and dataset are in the attached zip file). I got these results:

for k= 5 2.4749 (skewness) 0.8889 (atkison) 0.8889 (gini) 0.8889 (robinhood) 1.0 (hub occurrence) 0.8889 (anti-hub occurrence)

for k= 15 0.7071 (skewness) 0.6667 (atkison) 0.6667 (gini) 0.6667 (robinhood) 1.0 (hub occurrence) 0.6667 (anti-hub occurrence)

for k= 30 -0.7071 (skewness) 0.3333 (atkison) 0.3333 (gini) 0.3333 (robinhood) 1.0 (hub occurrence) 0.3333 (anti-hub occurrence)

From the point of view of skewness, atkison, gini and robinhood. The hubness is reduced as the number of neighbors to investigate hubness is increased. Is this assumption correct? When the observed value can be considered high (or low)?

What about hub and anti-hub occurrences? How I can have hub occurrence high (and remain the same) while anti-hub decreases with increasing number of neighbors to investigate hubness? How anti-hubness is not relative very low, when hub occurrence is 1?

I thank you in advance for your comments and clarifications.

Kind regards,

Ivan

script_and_test_data.zip

VarIr commented 2 years ago

Hi Ivan,

thanks for your continuing interest.

I hope I could shed some light on these topics.

ivan-marroquin commented 2 years ago

Hi @VarIr

Many thanks for sharing for all this information. It is very helpful.

Ivan