apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

Fix bugs in HNSW diversity check introduced in LUCENE-10577 #11782

Closed msokolov closed 2 years ago

msokolov commented 2 years ago

Description

we observed changes in recall that can be traced to these diversity checks done while indexing.

Version and environment details

No response

msokolov commented 2 years ago

https://github.com/apache/lucene/pull/11781

msokolov commented 2 years ago

merged #11781 and cherry-picked to branch_9x and branch_9_4

jtibshirani commented 2 years ago

@msokolov a test case started failing regularly after you merged the change. Here's an example repro line:

./gradlew test --tests TestKnnVectorQuery.testFilterWithSameScore -Dtests.seed=1951CEB96E0899ED -Dtests.locale=en-PR -Dtests.timezone=Antarctica/South_Pole -Dtests.asserts=true -Dtests.file.encoding=UTF-8
msokolov commented 2 years ago

Thanks, I had opened https://github.com/apache/lucene/issues/11787. I'm not entirely sure this is unexpected? But maybe the graphs have become sparser somehow??

jtibshirani commented 2 years ago

Oh oops, I had missed that. I made a comment on the issue.