Closed iamsanjay closed 1 month ago
Below code snippet is from 9_10 branch where this issues has been observed. As per the latest change for 10, we have moved few set of lines from below method to other class into a new method. java.base/java.lang.Math.toIntExact(Math.java:1135) at org.apache.lucene.store.DataOutput.writeGroupVInts(DataOutput.java:354) at
Sorry for missing the email list, It seems the docDeltaBuffer
should not overflow if just reading the code, I will try to reproduce this issue, Could you show me your source code for indexing, and some sample data? @iamsanjay
Hi @easyice, I am the original reporter on the mailing list.
As the code around indexing is a bit abstracted it might be hard to follow. What I do have, is the index that failed merging it is however, 173 GB xz compressed. I could use luke or a tool like that to extract more information for the lucene team.
The fieldtype that we are indexing into is
UNSTORED_POSITIONAL.setOmitNorms(true);
UNSTORED_POSITIONAL.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
UNSTORED_POSITIONAL.setStored(false);
UNSTORED_POSITIONAL.setTokenized(false);
UNSTORED_POSITIONAL.freeze();```
Then we add fields like so
doc.add(new Field("type", value.toLowerCase(Locale.US), UNSTORED_POSITIONAL);
With over 1,177,800,000 documents in this index, all with the term "positional" at least once in their documents. On average there are three fields of this type in each document.
So to create local sample data I would just do ;)
for (int i=0;i<2_000_000_000;i++){
{
Document doc = new Document();
doc.add(new Field("type", "number", UNSTORED_POSITIONAL);
if (i % 2 == 0} {
doc.add(new Field("type", "even", UNSTORED_POSITIONAL);
} else {
doc.add(new Field("type", "un-even", UNSTORED_POSITIONAL);
}
writer.addDocument(doc);
}
Thank you @JervenBolleman , I have found the cause of the issue with @gf2121 , i will raise a PR later.
Here is the java-user
discussion that lead to this issue.
Thank you for reporting this @iamsanjay! It looks like it was a real bug, phew, and somewhat serious (not sure).
And thank you @easyice and @gf2121 for the quick repro/fix.
Description
As being discussed on email list that
DataOutput.writeGroupVInts
throws as IntegerOverflow exception. The goal is to find out the main reason and also to improve the exception message.More context from the reporter
Version and environment details
No response