apache / lucenenet

Apache Lucene.NET
https://lucenenet.apache.org/
Apache License 2.0
2.24k stars 639 forks source link

ArrayIndexOutOfBoundsException in ByteBlockPool #1003

Open hidingmyname opened 3 weeks ago

hidingmyname commented 3 weeks ago

Is there an existing issue for this?

Describe the bug

A field with a very large number of small tokens can cause ArrayIndexOutOfBoundsException in ByteBlockPool due to an arithmetic overflow.

The issue was originally reported in Lucene(https://issues.apache.org/jira/browse/LUCENE-8614 and https://issues.apache.org/jira/browse/LUCENE-10441), where an arithmetic overflow occurs in the byteOffset calculation when BytesBlockPool advances to the next buffer on the last line of the nextBuffer() method.

Although the statuses of the two issue reports from Lucene remain open, the developers have, in fact, resolved this issue through PR.

The resolution in Lucene involves using Math.addExact to throw an ArithmeticException when the offset overflows in a ByteBlockPool. The fix code in ByteBlockPool as below:

- byteOffset += BYTE_BLOCK_SIZE;
+ byteOffset = Math.addExact(byteOffset, BYTE_BLOCK_SIZE);

A test case is presented in the Lucene repo. I have migrated this test case to Lucene.Net version and the test fails. See Steps To Reproduce.

Expected Behavior

Throw an ArithmeticException when the offset overflows in a ByteBlockPool.

Steps To Reproduce

The migrated test case is provided as below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Core;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.QueryParsers.Simple;
using Lucene.Net.Search;
using Lucene.Net.Util;
using NUnit.Framework;

namespace TestProject1
{
    [TestFixture]
    public class Test_1
    {
        [Test]
        public void TestTooManyAllocs()
        {
            // Use a mock allocator that doesn't waste memory
            ByteBlockPool pool = new ByteBlockPool(new MockAllocator(0));
            pool.NextBuffer();

            bool throwsException = false;
            int maxIterations = int.MaxValue / ByteBlockPool.BYTE_BLOCK_SIZE + 1;

            for (int i = 0; i < maxIterations; i++)
            {
                try
                {
                    pool.NextBuffer();
                }
                catch (OverflowException)
                {
                    // The offset overflows on the last attempt to call NextBuffer()
                    throwsException = true;
                    break;
                }
            }

            Assert.That(throwsException, Is.True);
            Assert.That(pool.ByteOffset + ByteBlockPool.BYTE_BLOCK_SIZE < pool.ByteOffset, Is.True);
        }

        private class MockAllocator : ByteBlockPool.Allocator
        {
            private readonly byte[] buffer;

            public MockAllocator(int blockSize) : base(blockSize)
            {
                buffer = Array.Empty<byte>();
            }

            public override void RecycleByteBlocks(byte[][] blocks, int start, int end)
            {
                // No-op
            }

            public override byte[] GetByteBlock()
            {
                return buffer;
            }
        }
    }
}

Exceptions (if any)

Assert.That(throwsException, Is.True) Expected: True But was: False

Lucene.NET Version

4.8.0-beta00016

.NET Version

8.0.403

Operating System

Windows 10

Anything else?

No response