Closed hweller1 closed 1 year ago
The benchmark right now just adds all vectors and then build the index.
I agree it would be nice to have a benchmark that adds / queries sequentially. But that would change the nature of the benchmarks quite a bit. Maybe later!
It's my understanding that as more data pts are vectorized after the initial index is built, many of these algorithms start to perform worse. In situations where you have a regularly updating dataset (e.g. Twitter) that you'd like to perform semantic search over, it would be nice to understand how these different algorithms stack up against each other in terms of index drift without having to entirely rebuild the index which could be prohibitively expensive.