castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.01k stars 444 forks source link

Improvements to flat vector search #2512

Closed lintool closed 3 months ago

lintool commented 3 months ago

This huge PR is ready for review.

I've hooked everything up to regressions, now we have complete set of regressions for all BEIR datasets: {cached, ONNX} x {original, int8}.

We now have two separate codecs: AnseriniLucene99FlatVectorFormat and AnseriniLucene99ScalarQuantizedVectorsFormat.

Mostly looking for a sanity check. In process of re-running all regressions to make sure everything still works.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 79.16667% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 67.09%. Comparing base (2152338) to head (0389059). Report is 1 commits behind head on master.

Files Patch % Lines
.../AnseriniLucene99ScalarQuantizedVectorsFormat.java 75.60% 8 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2512 +/- ## ============================================ + Coverage 67.07% 67.09% +0.01% - Complexity 1469 1472 +3 ============================================ Files 218 219 +1 Lines 12585 12628 +43 Branches 1523 1526 +3 ============================================ + Hits 8442 8473 +31 - Misses 3618 3628 +10 - Partials 525 527 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.