facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.19k stars 3.53k forks source link

random generator produces the same number sequence with the same seed #2487

Open CaucherWang opened 1 year ago

CaucherWang commented 1 year ago

Summary

In random.h, struct RandomGenerator, random generator produces the same number sequence with the same seed. I found this problem in NSG.cpp, search_on_grapth function.

Platform

Faiss version:e7b9d5514bc41109a0c502bcb6dc72ac196b287a

Reproduction instructions

It is easy to be reproduced.

mdouze commented 1 year ago

What did you expect?

CaucherWang commented 1 year ago

What did you expect?

Would it be better to generate random vertex sequences with different queries? A fixed group of "random" vertexes that fill the candidate pool seems a little wired. I'm not sure whether it will do harm to the performance or not.

mdouze commented 1 year ago

Right, in NSG this is actually a valid concern. @KinglittleQ what do you think?

KinglittleQ commented 1 year ago

The original intent was to make the search process deterministic, which will produce the same search result given the same query. This deterministic does not harm the accuracy according to my evaluation.

I think your concern is reasonable. Maybe we could find a way to generate random sequences depending on the query point.

KinglittleQ commented 1 year ago

Assuming x is the query vector, we could set the random seed to *(unsigned int*)&(sum(x)).

mdouze commented 1 year ago

There is a hash function that could be applied to queries: https://github.com/facebookresearch/faiss/blob/151e3d7be54aec844b6328dc3e7dd0b83fcfa5bc/faiss/utils/utils.h#L168

CaucherWang commented 1 year ago

Thank you both. Maybe @KinglittleQ can change this line in the next pr, if you think it's more reasonable.