erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.88k stars 735 forks source link

Disable algorithms throwing error (`luceneknn` and `pgvector`) #487

Closed gauravpandit91 closed 8 months ago

gauravpandit91 commented 8 months ago

Disable the algorithms 'luceneknn' and 'pgvector'. They throw error during execution. They can be enabled again once they run smoothly. The dataset used was "glove-25-angular".

Please find below the screenshots showing the error during executing:

  1. 'luceneknn' Screenshot 1 Screenshot 2024-01-14 at 2 27 28 PM Screenshot 2 Screenshot 2024-01-14 at 2 27 56 PM Screenshot 3 Screenshot 2024-01-14 at 2 29 02 PM

  2. 'pgvector' Screenshot 1 Screenshot 2024-01-14 at 4 31 09 PM Screenshot 2 Screenshot 2024-01-14 at 4 31 43 PM Screenshot 3 Screenshot 2024-01-14 at 4 32 10 PM

ankane commented 8 months ago

@gauravpandit91 It looks like you need to rebuild your pgvector image. It runs fine on CI: https://github.com/erikbern/ann-benchmarks/actions/runs/7480353560/job/20359594962

gauravpandit91 commented 8 months ago

@ankane Maybe rebuilding the image would run it on my machine. However, this problem may re-appear in future. So, maybe a fixed version in the Dockerfile can be helpful. A good example for this can be #485. It is a recently merged request which tackles future compatibility issues (for another algorithm implementation).

jkatz commented 8 months ago

The referenced PR is installing a package from a repository, whereas the pgvector Dockerfile is building from source. While it'd be possible to pin a version number as part of the checkout, it also requires additional maintenance to keep the version maintained. Doing a spot check on other algorithms, it seems to be more common to not pin to a specific version.

In this case, particularly for how this repo is used and how the projects continuously change in general, it does make sense to rebuild the container images on a regular basis to ensure you're getting the best possible snapshot of how the algorithms perform at a given point in time.

erikbern commented 8 months ago

I think it makes sense to disable algorithms that fail consistently, but it seems fine to keep them if failures are temporary

jkatz commented 8 months ago

@erikbern Agreed. In the case for pgvector, it continues to pass the CIjobs (as I referenced in a different PR), so I don't see the need to remove it. In the PR I recently opened, luceneknn passed too.

I'd recommend closing this PR given it seems the approach for both of those would be to rebuild the containers, and that's what the CIs are doing.