apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

Fix IntegerOverflow exception in postings encoding as group-varint #13376

Closed easyice closed 1 month ago

easyice commented 1 month ago

Closes: https://github.com/apache/lucene/issues/13373

This exception occurs because a negative integer value stores as positive long. In line 376, after a long value << 1, if the sign bit of the integer value is 1, it will be a negative number as integer, but a positive numbers as long, when we stores this value as positive long, it would cause Math.toIntExact to throw ArithmeticException exception.

https://github.com/apache/lucene/blob/f12e4899bf0420693e4f524a515dafcf0f21a3d3/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsWriter.java#L373-L379

POC code:

    public void testDelta() {
        int delta = 1 << 30;
        long deltaL = delta;
        deltaL = (deltaL << 1) | 1;
        System.out.println(deltaL > Integer.MAX_VALUE); // true
        System.out.println(deltaL); // 2147483649
        System.out.println((int) deltaL); // -2147483647
        Math.toIntExact(deltaL); // ArithmeticException: integer overflow
    }

TODO:

easyice commented 1 month ago

This change keeps the input values of writeGroupVInts explained as integer, instead of a big number greater than Integer.MAX_VALUE

easyice commented 1 month ago

The essence of this issue is how to deal with the integer value with the sign bit as 1 (like this integer overflow case). We have two options.

The first approach feels more reasonable.

jpountz commented 1 month ago

Thanks for looking into it! Your approach works, but I'm tempted to fix it the other way around, by no longer checking if values are in the expected range with Math.toIntExact but rather by checking that the value is in the [0, 2**32) range?

easyice commented 1 month ago

That's also a good idea! by this approach we can make writeGroupVInts/readGroupVInt use positive only. it's actually handled as an unsigned integer, so we don't need to consider the case of encoding negative in group-varint. i will update the code.

easyice commented 1 month ago

I pushed the requested changes, @jpountz . No rush, just wanted to let you know.

mikemccand commented 1 month ago

It's hard for me to tell what the expected user impact here is? Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which we now encode with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs?

I guess because we don't see too many users reporting this, it is likely rare-ish. But is the GroupVInt change released in 9.x? Is this maybe enough to warrant a bugfix release if so?

easyice commented 1 month ago

Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which we now encode with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs?

Yes, exactly. I guess the docID delta that was >= 1<<30 s likely rare-ish

But is the GroupVInt change released in 9.x? Is this maybe enough to warrant a bugfix release if so?

GroupVInt was released at 9.9.0 https://lucene.apache.org/core/9_9_0/changes/Changes.html#v9.9.0.optimizations +1 for bugfix release, what do you think?

jpountz commented 1 month ago

+1 to a bugfix release

jpountz commented 1 month ago

Can you backport to the 9.10 branch?

easyice commented 1 month ago

Okay, I will backport to 9.10/branch_9x.

easyice commented 1 month ago

Backport completed and added an entry under 9.10.1 Bug Fixes