erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.73k stars 715 forks source link

Update pgvector loading method to use binary format #488

Closed jkatz closed 4 months ago

jkatz commented 5 months ago

Particularly on large vector types, the pgvector module was spending significant time on converting floating point values to ASCII before being transmitted to the PostgreSQL server. This changes keeps the format in binary, reducing overhead. One test demonstrated a 63% reduction in load time, which would have an impact on the overall "build" time as reported by this benchmark.

jkatz commented 5 months ago

Noting that of the automated tests, the pgvector one passed, which would be the key regression to check in this PR. Using the eyeball test with a few other open PRs, it looks like the test also completed more quickly, which can also be a side-effect of this PR, but there could be other factors at play with the runtime numbers.

maumueller commented 4 months ago

Thanks!