Closed easyice closed 1 month ago
This change keeps the input values of writeGroupVInts
explained as integer, instead of a big number greater than Integer.MAX_VALUE
The essence of this issue is how to deal with the integer value with the sign bit as 1 (like this integer overflow case). We have two options.
writeGroupVInts
, and also return a negative value in readGroupVInt
. Previously, we didn't consider encoding a negative value in group-varint. Currently, readGroupVInt
may return a positive or negative value in different implementations. writeGroupVInts
/ readGroupVInt
, and replace Math.toIntExact
with (int)
.The first approach feels more reasonable.
Thanks for looking into it! Your approach works, but I'm tempted to fix it the other way around, by no longer checking if values are in the expected range with Math.toIntExact
but rather by checking that the value is in the [0, 2**32)
range?
That's also a good idea! by this approach we can make writeGroupVInts
/readGroupVInt
use positive only. it's actually handled as an unsigned integer, so we don't need to consider the case of encoding negative in group-varint. i will update the code.
I pushed the requested changes, @jpountz . No rush, just wanted to let you know.
It's hard for me to tell what the expected user impact here is? Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which we now encode with GroupVInt, had a docID delta that was >= 1<<30
, when the postings are also storing freqs?
I guess because we don't see too many users reporting this, it is likely rare-ish. But is the GroupVInt change released in 9.x? Is this maybe enough to warrant a bugfix release if so?
Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which we now encode with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs?
Yes, exactly. I guess the docID delta that was >= 1<<30
s likely rare-ish
But is the GroupVInt change released in 9.x? Is this maybe enough to warrant a bugfix release if so?
GroupVInt was released at 9.9.0 https://lucene.apache.org/core/9_9_0/changes/Changes.html#v9.9.0.optimizations +1 for bugfix release, what do you think?
+1 to a bugfix release
Can you backport to the 9.10 branch?
Okay, I will backport to 9.10/branch_9x.
Backport completed and added an entry under 9.10.1 Bug Fixes
Closes: https://github.com/apache/lucene/issues/13373
This exception occurs because a negative integer value stores as positive long. In line 376, after a long value
<< 1
, if the sign bit of the integer value is 1, it will be a negative number as integer, but a positive numbers as long, when we stores this value as positive long, it would causeMath.toIntExact
to throwArithmeticException
exception.https://github.com/apache/lucene/blob/f12e4899bf0420693e4f524a515dafcf0f21a3d3/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsWriter.java#L373-L379
POC code:
TODO: