erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.73k stars 715 forks source link

Consistently prewarming database #500

Closed wahajali closed 1 month ago

wahajali commented 3 months ago

pg_embedding client currently does pre-warming of database while the same is not happening in other databases, including pgvector. Does this effect the comparision in terms of QPS? And should this be standardized across the board?

ankane commented 3 months ago

Hi @wahajali, for pgvector specifically, the data should already be in shared buffers, so prewarming shouldn't make a difference.

wahajali commented 3 months ago

Hi @ankane could you please elaborate why that is the case? Does the shared_buffer get populated on insertion?

ankane commented 3 months ago

Yes, and when the index is created. You can run the benchmarks with and without pg_prewarm to compare.