antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.12k stars 3.28k forks source link

Java: NegativeArraySizeException with input file of size 1.1 GiB #2215

Open psychon opened 6 years ago

psychon commented 6 years ago

ANTLRInputStream reads in all input into an array. If the array is full, it is copied to another array with twice the size. The initial size is (by default) 1024, a power of two. This causes parsing of files larger than 1 GiB (2^30 bytes, to be exact) to fail: Since there are more than 2^30 bytes, this size is doubled. However, doubling this number results in an overflow, because 2^31 sets the sign bit. So, this ends up trying to create an array of size Integer.MIN_VALUE:

Exception in thread "main" java.lang.NegativeArraySizeException
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at org.antlr.v4.runtime.ANTLRInputStream.load(ANTLRInputStream.java:123)
    at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:86)
    at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:82)
    at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:90)

Something like the following could help, but this will cause other issues for files larger thanInteger.MAX_VALUE. I guess some exception should be thrown whendata.lengthisInteger.MAX_VALUE`?

--- runtime/Java/src/org/antlr/v4/runtime/ANTLRInputStream.java.orig    2018-02-05 16:16:59.031825397 +0100
+++ runtime/Java/src/org/antlr/v4/runtime/ANTLRInputStream.java 2018-02-05 16:18:15.995627939 +0100
@@ -98,7 +98,10 @@ public class ANTLRInputStream implements
                do {
                    if ( p+readChunkSize > data.length ) { // overflow?
                        // System.out.println("### overflow p="+p+", data.length="+data.length);
-                       data = Arrays.copyOf(data, data.length * 2);
+                   int newLength = data.length * 2;
+                   if (newLength < 0)
+                       newLength = Integer.MAX_VALUE;
+                       data = Arrays.copyOf(data, newLength);
                    }
                    numRead = r.read(data, p, readChunkSize);
                    // System.out.println("read "+numRead+" chars; p was "+p+" is now "+(p+numRead));
NathanJAdams commented 6 years ago

The class uses a char array, so a simple fix would be to use a list of char arrays. Then instead of doubling the size each time, just add the new array to the list.