apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.03k forks source link

ArrayIndexOutOfBoundsException during indexing [LUCENE-10441] #11477

Open asfimport opened 2 years ago

asfimport commented 2 years ago

Hi experts!, i have facing ArrayIndexOutOfBoundsException during indexing and committing documents, this exception gives me no clue about what happened so i have little information for debugging, can i have some suggest about what could be and how to fix this error? i'm using Lucene 8.10.0

java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.lucene.util.BytesRefHash$1.get(BytesRefHash.java:179)
    at org.apache.lucene.util.StringMSBRadixSorter$1.get(StringMSBRadixSorter.java:42)
    at org.apache.lucene.util.StringMSBRadixSorter$1.setPivot(StringMSBRadixSorter.java:63)
    at org.apache.lucene.util.Sorter.binarySort(Sorter.java:192)
    at org.apache.lucene.util.Sorter.binarySort(Sorter.java:187)
    at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:41)
    at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:83)
    at org.apache.lucene.util.IntroSorter.sort(IntroSorter.java:36)
    at org.apache.lucene.util.MSBRadixSorter.introSort(MSBRadixSorter.java:133)
    at org.apache.lucene.util.MSBRadixSorter.sort(MSBRadixSorter.java:126)
    at org.apache.lucene.util.MSBRadixSorter.sort(MSBRadixSorter.java:121)
    at org.apache.lucene.util.BytesRefHash.sort(BytesRefHash.java:183)
    at org.apache.lucene.index.SortedSetDocValuesWriter.flush(SortedSetDocValuesWriter.java:171)
    at org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:348)
    at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:228)
    at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
    at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476)
    at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656)
    at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364)
    at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770)
    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) 

Migrated from LUCENE-10441 by Peixin Li, updated Mar 14 2022 Linked issues:

asfimport commented 2 years ago

Christine Poerschke (@cpoerschke) (migrated from JIRA)

line 179 from the stacktrace above is https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.10.0/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L179 i.e.

pool.setBytesRef(scratch, bytesStart[compact[i]]);

and pool as per https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.10.0/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L55 is of ByteBlockPool type. So this issue could be similar to or same as the #9660 issue.

asfimport commented 2 years ago

Peixin Li (migrated from JIRA)

How many tokens should causing issue? and is there a way to improve it

currently i'm using slandered analyzer for indexWriter, it could cause too many tokens if terms are having a lot of "-" or "." right?