Closed benfred closed 1 year ago
Benchmarking this change on a dataset of github stars - containing 9M items with 96 dimensional embeddings, shows a good improvement in queries per second:
dataset | batch_size | Previous QPS | QPS with RAFT | % improvement |
---|---|---|---|---|
github | 1 | 161.73 | 185.26 | 14.5% |
github | 1000 | 2299.46 | 2774.98 | 21% |
This changes to use RAFT https://github.com/rapidsai/raft for GPU top-k code instead of faiss. The RAFT version is quite a bit faster, and also doesn't have the same performance issues with small batch sizes that faiss has (meaning we can delete a bunch of code that was trying to work around that). RAFT also doesn't have limitations on the size of K, where faiss is limited to k less than 2048.