apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults #13361

Closed timgrein closed 1 month ago

timgrein commented 1 month ago

Closes https://github.com/apache/lucene/issues/13210

Description

The following test failed as it produced two different lists of ids for a sorted and unsorted HNSW byte vector graph as one graph didn't find a higher scoring doc the other one found: gradlew test --tests TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults -Dtests.seed=B41BEC5619361A16 -Dtests.locale=hi-IN -Dtests.timezone=Atlantic/Stanley -Dtests.asserts=true -Dtests.file.encoding=UTF-8

Considering that the graphs of 2 indices are organized differently we need to explore a lot of candidates to ensure that both searchers find the same docs. Increasing beamWidth (number of nearest neighbor candidates to track while searching the graph for each newly inserted node) from 5 to 10 fixes the test.

benwtrent commented 1 month ago

@timgrein could you determine if the scores the same or not? I wonder if we are getting tripped up by doc IDs being the tie breaker for equal scores.

timgrein commented 1 month ago

@benwtrent

Without increasing k we'll get the following for the failing test instance:

TOP 1 docs:
Document<stored<id:23>> 9.601536E-5
Document<stored<id:119>> 7.3713694E-5
Document<stored<id:163>> 7.087675E-5
Document<stored<id:148>> 7.051192E-5
Document<stored<id:51>> 6.879472E-5

TOP 2 docs:
Document<stored<id:23>> 9.601536E-5
Document<stored<id:193>> 8.53898E-5
Document<stored<id:119>> 7.3713694E-5
Document<stored<id:163>> 7.087675E-5
Document<stored<id:148>> 7.051192E-5

(So it seems like the first/unsorted index doesn't find document 193, when k is too small (60); I guess due to the different index structure?)

Increasing k to 80 leads to the following results for the previously failing test instance:

TOP 1 docs:
Document<stored<id:23>> 9.601536E-5
Document<stored<id:193>> 8.53898E-5
Document<stored<id:119>> 7.3713694E-5
Document<stored<id:163>> 7.087675E-5
Document<stored<id:148>> 7.051192E-5

TOP 2 docs:
Document<stored<id:23>> 9.601536E-5
Document<stored<id:193>> 8.53898E-5
Document<stored<id:119>> 7.3713694E-5
Document<stored<id:163>> 7.087675E-5
Document<stored<id:148>> 7.051192E-5
benwtrent commented 1 month ago

@timgrein what is the beamwidth set to in the failing case?

We may want to increase the beamWidth size to just make the test more consistent.

int beamWidth = random().nextInt(10) + 10; // from the previous base of 5
timgrein commented 1 month ago

@benwtrent The beam width for the failing test case was the smallest value possible 5. Increased the minimum to 10 according to your suggestion (which also fixes the test without increasing k). Do we still want to keep the increased k? Increasing it didn't seem to affect execution time too much at least for this test instance and it should probably reduce the likelihood of a flaky fail even further.

benwtrent commented 1 month ago

Do we still want to keep the increased k?

I would rather not, we keep bumping it up, eventually we are going to stop searching in the graph altogether and just brute force, which ruins the reason for the test.

timgrein commented 1 month ago

eventually we are going to stop searching in the graph altogether and just brute force, which ruins the reason for the test

Makes sense, decreased k again to 60 👍