NRT failure due to FieldInfo & File mismatch

benwtrent commented 1 month ago

Description

There has been a nasty test failure in ES for awhile: https://github.com/elastic/elasticsearch/issues/105122

The test simulates a document indexing failure. It turns out, that this test failure is caused by a series of strange conditions in Lucene. If we fail on indexing a field, but have points value field that comes AFTER the field that is indexing, things will blow up when opening a reader if the writer has soft-deletes enabled.

The failure description is as follows:

First, we have an IndexWriter configured with soft-deletes & no commits on closing
Index a document with fields as follows ["field that will throw, "nice point field"]
We update the FieldInfos eagerly here: https://github.com/apache/lucene/blob/0aa88910ca9a1032d288996d14203eac4953f2de/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L592-L603
FieldInfos now indicate we have a point field
The field that will throw is handled, document indexing fails. Since this is a regular text field, it does not automatically close the indexer
A NRT reader is opened on the writer and attempts to flush, but the field info is incorrect given the fields that are there with soft-delete (e.g. don't delete the segment)

Test that replicates the failure

```java public void testExceptionJustBeforeFlushWithPointValues() throws Exception { Directory dir = newDirectory(); Analyzer analyzer = new Analyzer(Analyzer.PER_FIELD_REUSE_STRATEGY) { @Override public TokenStreamComponents createComponents(String fieldName) { MockTokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false); tokenizer.setEnableChecks( false); // disable workflow checking as we forcefully close() in exceptional cases. TokenStream stream = new CrashingFilter(fieldName, tokenizer); return new TokenStreamComponents(tokenizer, stream); } }; DirectoryReader r = null; IndexWriterConfig iwc = newIndexWriterConfig(analyzer).setCommitOnClose(false).setMaxBufferedDocs(3); MergePolicy mp = iwc.getMergePolicy(); iwc.setMergePolicy( new SoftDeletesRetentionMergePolicy("soft_delete", MatchAllDocsQuery::new, mp)); IndexWriter w = RandomIndexWriter.mockIndexWriter(dir, iwc, random()); Document newdoc = new Document(); newdoc.add(newTextField("crash", "do it on token 4", Field.Store.NO)); newdoc.add(new IntPoint("int", 17)); expectThrows(IOException.class, () -> w.addDocument(newdoc)); try { r = w.getReader(false, false); } catch (AlreadyClosedException ace) { // expected } dir.close(); } ```

The exception thrown is:

        Caused by:
        java.io.FileNotFoundException: No sub-file with id .kdi found in compound file "_0.cfs" (fileName=_0.kdi files: [_Lucene99_0.tip, .nvm, .fnm, .tvd, _Lucene99_0.doc, _Lucene99_0.tim, _Lucene99_0.pos, .tvm, _Lucene99_0.tmd, .fdm, .nvd, .fdx, .tvx, .fdt])
            at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:170)
            at org.apache.lucene.codecs.lucene90.Lucene90PointsReader.<init>(Lucene90PointsReader.java:63)
            at org.apache.lucene.codecs.lucene90.Lucene90PointsFormat.fieldsReader(Lucene90PointsFormat.java:74)
            at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:152)
            ... 55 more

Version and environment details

No response

benwtrent commented 1 month ago

I am having a difficult time figuring out how to fix this. It seems to me that if the segment is "hard deleted", we should reset all its FieldInfos as there isn't any data written in it at all.

But, I am not sure the individual processDoc action can do this as it only knows about the documents it added.

benwtrent commented 1 month ago

What makes matters worse, is that it doesn't even have to be ALL docs that failed, just some of them that had point values (or knn vector values, etc.). Anything that eagerly updates FieldInfos but don't actually get flushed could trigger this weird behavior when opening the NRT reader.

apache / lucene

NRT failure due to FieldInfo & File mismatch #13353

Description

Version and environment details