apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.59k stars 1.01k forks source link

During merging, write empty (headers-only) numeric vectors when vector-valued FieldInfos is present [LUCENE-9992] #11031

Open asfimport opened 3 years ago

asfimport commented 3 years ago

When we merge segments having all documents with vectors in some field deleted, we write FieldInfo for that field, but no index files, and then when we read the index we get exceptions as shown by this ci test failure:

Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/294/

1 tests failed. FAILED:  org.apache.lucene.codecs.lucene90.TestLucene90HnswVectorFormat.testDeleteAllVectorDocs

Error Message: org.apache.lucene.index.CorruptIndexException: Problem reading index from RawDirectoryWrapper(ByteBuffersDirectory@70f29f52 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@486fbd6) (resource=RawDirectoryWrapper(ByteBuffersDirectory@70f29f52 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@486fbd6))

Stack Trace: org.apache.lucene.index.CorruptIndexException: Problem reading index from RawDirectoryWrapper(ByteBuffersDirectory@70f29f52 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@486fbd6) (resource=RawDirectoryWrapper(ByteBuffersDirectory@70f29f52 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@486fbd6))         at __randomizedtesting.SeedInfo.seed([DA9C80A622E56DF7:82EB7A7E3E790C0C]:0)         at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:160)         at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)         at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)         at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:221)         at org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:534)         at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:137)         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:596)         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:452)         at org.apache.lucene.index.BaseVectorFormatTestCase.testDeleteAllVectorDocs(BaseVectorFormatTestCase.java:564)         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.base/java.lang.reflect.Method.invoke(Method.java:566)         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)         at org.junit.rules.RunRules.evaluate(RunRules.java:20)         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)         at org.junit.rules.RunRules.evaluate(RunRules.java:20)         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)         at java.base/java.lang.Thread.run(Thread.java:834)         Suppressed: java.lang.RuntimeException: CheckIndex failed                 at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:331)                 at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:306)                 at org.apache.lucene.store.BaseDirectoryWrapper.close(BaseDirectoryWrapper.java:42)                 at org.apache.lucene.index.BaseVectorFormatTestCase.testDeleteAllVectorDocs(BaseVectorFormatTestCase.java:549)                 ... 39 more Caused by: java.io.FileNotFoundException: No sub-file with id _Lucene90HnswVectorFormat_0.vem found in compound file "_2.cfs" (fileName=_2_Lucene90HnswVectorFormat_0.vem files: [.fnm, _Lucene90_0.tip, _Lucene90_0.tmd, _Lucene90_0.doc, _Lucene90_0.tim, .fdm, .fdx, .fdt])         at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:168)         at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:152)         at org.apache.lucene.codecs.lucene90.Lucene90HnswVectorReader.readMetadata(Lucene90HnswVectorReader.java:98)         at org.apache.lucene.codecs.lucene90.Lucene90HnswVectorReader.<init>(Lucene90HnswVectorReader.java:67)         at org.apache.lucene.codecs.lucene90.Lucene90HnswVectorFormat.fieldsReader(Lucene90HnswVectorFormat.java:91)         at org.apache.lucene.codecs.perfield.PerFieldVectorFormat$FieldsReader.<init>(PerFieldVectorFormat.java:209)         at org.apache.lucene.codecs.perfield.PerFieldVectorFormat.fieldsReader(PerFieldVectorFormat.java:77)         at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:153)         ... 47 more

Build Log: [...truncated 664 lines...] ERROR: The following test(s) have failed:   - org.apache.lucene.codecs.lucene90.TestLucene90HnswVectorFormat.testDeleteAllVectorDocs (:lucene:core)     Test output: /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.codecs.lucene90.TestLucene90HnswVectorFormat.txt     Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.codecs.lucene90.TestLucene90HnswVectorFormat.testDeleteAllVectorDocs" -Ptests.jvms=4 -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=DA9C80A622E56DF7 -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false -Ptests.file.encoding=UTF-8 -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt


Migrated from LUCENE-9992 by Michael Sokolov (@msokolov), updated Jun 09 2021 Pull requests: https://github.com/apache/lucene/pull/172

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 465cb17d2b5762f01fbe3069dabd5841eaadac8b in lucene's branch refs/heads/main from Michael Sokolov https://gitbox.apache.org/repos/asf?p=lucene.git;h=465cb17

LUCENE-9992: write empty vector fields when merging (#172)

asfimport commented 3 years ago

Alan Woodward (@romseygeek) (migrated from JIRA)

This looks like it's causing failures in the elastic CI:

gradlew :lucene:codecs:test --tests "org.apache.lucene.codecs.simpletext.TestSimpleTextVectorFormat.testDeleteAllVectorDocs" -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=7ED7E0ED37CE72D4 -Ptests.nightly=true -Ptests.file.encoding=UTF-8 

 

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit f5e050bd008d6d1c4107dc903cb4d1211e3976a4 in lucene's branch refs/heads/main from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene.git;h=f5e050b

LUCENE-9992: Update expectations about vectors with no values.

asfimport commented 3 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

@msokolov I tried to address test failures by pushing the above commit, I'd appreciate if you could have a look to make sure I didn't do anything wrong.