erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.73k stars 715 forks source link

Add IVF bench for PgVector extension #512

Closed C0rWin closed 2 months ago

C0rWin commented 2 months ago

Currently, the pgvector benchmark looks only at the HNSW index, missing the second index type, IVFFlat. This commit also extends the pgvector benchmark and the ability to execute the IVFFlat index. Though I understand HNSW might have superior performance, it's still interesting to have the ability to compare it with another index type.

C0rWin commented 2 months ago
_________________________________ test_hamming _________________________________

    def test_hamming():
        dist = metrics["hamming"].distance

        p = numpy.array([1, 1, 0, 0], dtype=numpy.bool_)
        q = numpy.array([1, 0, 0, 1], dtype=numpy.bool_)
>       assert dist(p, q) == pytest.approx(2)
E       assert 0.5 == 2 ± 2.0e-06
E         comparison failed
E         Obtained: 0.5
E         Expected: 2 ± 2.0e-06

test/distance_test.py:[21](https://github.com/erikbern/ann-benchmarks/actions/runs/8704604263/job/23873209353?pr=512#step:5:22): AssertionError
=========================== short test summary info ============================

not sure whenever and how is PR related to the UT failure.

C0rWin commented 2 months ago
_________________________________ test_hamming _________________________________

    def test_hamming():
        dist = metrics["hamming"].distance

        p = numpy.array([1, 1, 0, 0], dtype=numpy.bool_)
        q = numpy.array([1, 0, 0, 1], dtype=numpy.bool_)
>       assert dist(p, q) == pytest.approx(2)
E       assert 0.5 == 2 ± 2.0e-06
E         comparison failed
E         Obtained: 0.5
E         Expected: 2 ± 2.0e-06

test/distance_test.py:[21](https://github.com/erikbern/ann-benchmarks/actions/runs/8704604263/job/23873209353?pr=512#step:5:22): AssertionError
=========================== short test summary info ============================

not sure whenever and how is PR related to the UT failure.

tested with the previous commit, getting the same error.... I assume there are some issues with the main branch.

C0rWin commented 2 months ago

https://github.com/erikbern/ann-benchmarks/pull/513 should expected to fix the UT issue.

jkatz commented 2 months ago

cc @ankane

The IVFFlat testing for pgvector was explicitly removed in https://github.com/erikbern/ann-benchmarks/pull/463 -- Andrew & I had discussed this quite a bit and opted for the simpler approach of only having HNSW present in the ANN Benchmarks suite. Given this is the primary indexing method + de facto choice for pgvector, it makes sense to just have the single method present.

If we did decide to do too, I would opt to break it out as "pgvector_ivfflat" and keep the HNSW implementation as "pgvector"

C0rWin commented 2 months ago

cc @ankane

The IVFFlat testing for pgvector was explicitly removed in #463 -- Andrew & I had discussed this quite a bit and opted for the simpler approach of only having HNSW present in the ANN Benchmarks suite. Given this is the primary indexing method + de facto choice for pgvector, it makes sense to just have the single method present.

If we did decide to do too, I would opt to break it out as "pgvector_ivfflat" and keep the HNSW implementation as "pgvector"

Here is the thing: I do not think this is a de facto method since there might be workloads where you'd like to compromise the accuracy in favor of speed; having this benchmark might help to shed some light on differences between these two. Obviously, this is only IMHO.

jkatz commented 2 months ago

cc @ankane The IVFFlat testing for pgvector was explicitly removed in #463 -- Andrew & I had discussed this quite a bit and opted for the simpler approach of only having HNSW present in the ANN Benchmarks suite. Given this is the primary indexing method + de facto choice for pgvector, it makes sense to just have the single method present. If we did decide to do too, I would opt to break it out as "pgvector_ivfflat" and keep the HNSW implementation as "pgvector"

Here is the thing: I do not think this is a de facto method since there might be workloads where you'd like to compromise the accuracy in favor of speed; having this benchmark might help to shed some light on differences between these two. Obviously, this is only IMHO.

I'm in a position where I can make some more data-driven opinions on this, and from the data and conversations with pgvector users, the typical approach now is to go "HNSW-first." There are cases where I talk to users who are using IVFFlat and are very happy with their results, their workloads tend to be more static (all vectors are available at indexing time), whereas most use I'm seeing these days tends to be iterative (vectors are continuosuly added to the data set).

While we were evaluating pgvector's HNSW implementation, I did publish benchmarks (using an adapted ANN Benchmark) on the two methods. I just happened to be doing some comparative analysis on the upcoming pgvector release (which I'll publish), but the performance/recall ratio gap between pgvector's HNSW + IVFFlat implementation has grown, and IVFFlat's biggest advantage (index build time) has dwindled or been superceded. In the coming days I'll have additional data on this point.

I can understand from an evaluation standpoint where it may be helpful to compare the two methods, but from a simplicity and usage standpoint (and where the referenced changes to the module game), it makes more sense to compare the HNSW implementation. Maybe the compromise is including the code to execute an evaluation with IVFFlat, but have it disabled by default?