facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
29.54k stars 3.49k forks source link

Faiss KMeans Inertia #2511

Open ghost opened 1 year ago

ghost commented 1 year ago

I'm trying to see the inertia values of my clustering (sum of squared errors). I've seen several people using:

faiss.KMeans.obj[-1] as a measure of inertia value for KMeans.

However this value is always increasing for me with highher number of clusters, which is unexpected (should be decreasing).

This '.obj' attribute is defined as 'iteration_stats' in _swigfaiss_avx2 file as following:

iteration_stats = property(_swigfaiss_avx2.ProgressiveDimClustering_iteration_stats_get,_swigfaiss_avx2.ProgressiveDimClustering_iteration_stats_set, doc=r""" stats at every iteration of clustering""")

What does 'stats at every iteration of clustering' stand for here? Is it correct to use the last element of this array as the inertia value or should it be accessed by another variable?

mdouze commented 1 year ago

see stats here https://github.com/facebookresearch/faiss/blob/main/faiss/Clustering.h#L43

ghost commented 1 year ago

Thanks for the answer. It seems like obj[-1] should be corresponding to the SSE (or inertia).

Then do you have an idea about why it is not decreasing with higher number of clusters? I've seen it reported by many others. @mdouze

ghost commented 1 year ago

@mdouze Could you tell me if this is an expected behavior?

jjyyxx commented 1 year ago

@mdouze If I understand correctly, the obj is the squared-sum of all sample errors. But if I pass the weights parameter to KMeans.train, the objective matches neither the weighted (scikit-learn inertia) nor un-weighted (seems like so in faiss source), which is confusing.

Weighted: 7.508362767330208
Un-weighted: 7.685895746353934
obj.min() == obj[-1]: 7.669065475463867
ckolluru commented 2 months ago

Running into the same issue here. obj[-1] is increasing as the number of clusters increase.

What am I getting wrong? Is there a way to get inertia from the kmeans object?

srggrs commented 1 month ago

+1

ckolluru commented 1 month ago

This solved my issue. It was related to min and max points per centroid.

https://github.com/facebookresearch/faiss/issues/1887#issue-892534946