apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.61k stars 1.02k forks source link

IndexWriter loses track of parent field when index is empty #13340

Closed msokolov closed 4 months ago

msokolov commented 4 months ago

Description

This test fails with java.lang.IllegalArgumentException: can't add a parent field to an already existing index without a parent field. If you index any documents in the index, using the parent field or no, then the test passes.

public void testParentFieldEmptyIndex() throws IOException {
    try (Directory dir = newMockDirectory()) {
      IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
      iwc.setParentField("parent");
      try (IndexWriter writer = new IndexWriter(dir, iwc)) {
        writer.commit();
      }
      IndexWriterConfig iwc2 = new IndexWriterConfig(new MockAnalyzer(random()));
      iwc2.setParentField("parent");
      try (IndexWriter writer = new IndexWriter(dir, iwc2)) {
        writer.commit();
      }
    }
  }

Version and environment details

No response

msokolov commented 4 months ago

I hope someone who is familiar with this will quickly see the problem - I'm not sure if we (1) maybe fail to write any FieldInfos when the index is empty, but now we must, or (2) fail to notice that the index we just opened has no FieldInfos and therefore can have a parent field added? It makes more sense to me that we should write a FieldInfos for an empty index.

msokolov commented 4 months ago

I tried removing

    if (leaves.isEmpty()) {                                                                                                                                                                      
      return FieldInfos.EMPTY;                                                                                                                                                                   
    } else                                                                                                                                                                                       

from FieldInfos.getMergedFieldInfos but this didn't fix it -- although it also might be needed?

msokolov commented 4 months ago

OK, now I see that field infos is part of the segment so we would not have written it. I guess in this case we should explicitly recognize and work around the empty case.

msokolov commented 4 months ago

@simonw of you get a moment, your perspective would be helpful. Should we be writing index metadata somewhere outside of a segment? Or tweak the hack we have...