apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.65k stars 1.03k forks source link

"fdx size mismatch" exception in StoredFieldsWriter.closeDocStore() when closing index with 500M documents [LUCENE-1521] #2595

Closed asfimport closed 15 years ago

asfimport commented 15 years ago

When closing index that contains 500,000,000 randomly generated documents, an exception is thrown:

java.lang.RuntimeException: after flush: fdx size mismatch: 500000000 docs vs 4000000004 length in bytes of _0.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1688) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3518) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1623) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1588) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1562) ...

This appears to be a bug at StoredFieldsWriter.java:93:

  if (4+state.numDocsInStore\*8 != state.directory.fileLength(state.docStoreSegmentName + "." + IndexFileNames.FIELDS_INDEX_EXTENSION))

where the multiplication by 8 is causing integer overflow. The fix would be to cast state.numDocsInStore to long before multiplying.

It appears that this is another instance of the mistake that caused bug #2593. I did a cursory seach for \*8 against the code to see if there might be yet more instances of the same mistake, but found none.


Migrated from LUCENE-1521 by Shon Vella, resolved Feb 19 2009 Linked issues:

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Ugh, right. Plus another one (* 16) in TermVectorsTermsWriter.java. I'll fix.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Committed revision 735043. Thanks Shon!

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Reopening for backport to 2.4.1.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Committed revision 745803 on 2.4 branch.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Note that this issue only hits an index with many (> \~268 million) docs.

asfimport commented 15 years ago

Elliot Metsger (@emetsger) (migrated from JIRA)

I received this on 2.4.1, not sure if it is this bug or not: Exception in thread "main" java.lang.RuntimeException: after flush: fdx size mismatch: 10 docs vs 0 length in bytes of _sl3.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3363) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3408) at edu.jhu.library.ivoa.VOImageAccessUrlDownload.go(VOImageAccessUrlDownload.java:357) at edu.jhu.library.ivoa.VOImageAccessUrlDownload.main(VOImageAccessUrlDownload.java:103)

I'm working with over 500,000 docs in this particular index.

asfimport commented 15 years ago

Elliot Metsger (@emetsger) (migrated from JIRA)

Nevermind, it doesn't look like this is an occurrence of this bug. Not sure what happened... underlying storage is a ZFS file system. Anyway, this thread http://www.mail-archive.com/solr-user@lucene.apache.org/msg22264.html was helpful, explaining what may be happening.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Is there any thing in your env that might be removing index files out from under the IndexWriter? Are you changing your Directory's default locking impl, or disabling locking?

ZFS should be fine – I use it in my daily development. What a fabulous file system :) Snapshots & clones are very addictive...