facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.22k stars 3.63k forks source link

Test-clustering failing for some choices of d and n #2224

Open YangIsNotAvailable opened 2 years ago

YangIsNotAvailable commented 2 years ago

Platform

Faiss version: git commit 06ae6b8a590f3941e9c8b1e1ea0ee9d872045783

Installed from: compiled from source

Running on:

Interface:

Reproduction instructions

In tests/test_clustering.py line 20 and 21, I tried a few scenarios

  1. d=256 and n=1000 -> Fail assertGreater(prev, o) line 33 and get WARNING clustering 1000 points to 32 centroids: please provide at least 1248 training points
  2. d=256 and n=1248 -> Pass
  3. d=512 and n=1248 -> Fail again at line 33 and no warning

In general I don't understand why the algorithm/expected behavior should depend on special choices of d and n, if performance is not the concern.

mdouze commented 2 years ago

Point 3 is worrying. It may be due to roundoff errors.