FasterXML / jackson-core

Core part of Jackson that defines Streaming API as well as basic shared abstractions
Apache License 2.0
2.27k stars 799 forks source link

Fix infinite loop due to integer overflow when reading large strings #1350

Closed adamjshook closed 1 month ago

adamjshook commented 1 month ago

If the StreamReadConstraints are set to Integer.MAX_VALUE, reading a large string results in an infinite loop due to the value of max overflowing to a negative value. while (ptr < max) is always false so break ascii_loop is never reached.

This change uses Math.addExact to throw an ArthimeticException if an overflow were to occur and sets max to max integer, allowing the loop to break and large strings to be properly deserialized.

This can be reproduced like so, where the call to nextTextValue will trigger the infinite loop. After applying the patch, the code completes. I can add the test case somewhere (or put it in a new test class) however it requires a large JVM to run and I'm not sure the size of the JVMs used for testing.

    @Test
    void testLargeStringDeserialization() throws Exception {
        String largeString = RandomString.make(Integer.MAX_VALUE - 1024);
        JsonFactory f = JsonFactory.builder()
                .streamReadConstraints(StreamReadConstraints.builder()
                        .maxStringLength(Integer.MAX_VALUE)
                        .build())
                .build();
        ByteArrayOutputStream bytes = new ByteArrayOutputStream();
        JsonGenerator g = f.createGenerator(bytes);
        g.writeString(largeString);
        g.close();

        JsonParser parser = f.createParser(bytes.toByteArray());
        String parsedString = parser.nextTextValue();

        assertEquals(largeString, parsedString);
    }
pjfanning commented 1 month ago

Fair enough. Can't you add the test in the PR?

adamjshook commented 1 month ago

Fair enough. Can't you add the test in the PR?

Added the test case but I see the jobs are failing with Java heap space errors. The JVMs would need to be scaled up for the test and I'm not sure the appetite on that one for this change. I needed to set the JVM args to 16G to get it to pass.

pjfanning commented 1 month ago

@cowtowncoder is there some kind of annotation or pattern that we could use to mark a test as too big to run normally?

pjfanning commented 1 month ago

@adamjshook The same issue appears to occur in 2 places in NonBlockingUtf8JsonParserBase - https://github.com/search?q=repo%3AFasterXML%2Fjackson-core%20%22ptr%20%2B%20(outBuf.length%20-%20outPtr)%22&type=code

And one more: https://github.com/FasterXML/jackson-core/blob/2128a7015a21d98ac5a2e5f55d35d0692c8f06e2/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L2779

adamjshook commented 1 month ago

@adamjshook The same issue appears to occur in 2 places in NonBlockingUtf8JsonParserBase - https://github.com/search?q=repo%3AFasterXML%2Fjackson-core%20%22ptr%20%2B%20(outBuf.length%20-%20outPtr)%22&type=code

And one more:

https://github.com/FasterXML/jackson-core/blob/2128a7015a21d98ac5a2e5f55d35d0692c8f06e2/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L2779

Sure, can update in these other places as well.

cowtowncoder commented 1 month ago

@cowtowncoder is there some kind of annotation or pattern that we could use to mark a test as too big to run normally?

I don't think there is a specific annotation.

But I think there is already one test that is skipped by default, for running long; but can then be run from IDE manually. I'll see if I can find it.

EDIT: it's in src/test/java/com/fasterxml/jackson/core/io/NumberOutputTest.java but basic @Disabled, so locally need to comment out

cowtowncoder commented 1 month ago

Ok the basic idea is sound; even if we have no way to automate test runs. I'm ok with that.

Couple of notes...

First, I think this would be good to get in 2.18(.1); while unlikely to occur, infinite loops are nasty. So I think PR should be against 2.18.

Second: I'd prefer not using exception or Math.addExact() -- can simply check that result is positive number (in this case it cannot roll over to 0). But I think it could/should be extracted as a helper method in a base class (I can help find the space).

adamjshook commented 1 month ago

Ok the basic idea is sound; even if we have no way to automate test runs. I'm ok with that.

I can add @Disabled to the existing test so at least it is available should someone choose to run it.

First, I think this would be good to get in 2.18(.1); while unlikely to occur, infinite loops are nasty. So I think PR should be against 2.18.

Agreed, I can rebase on 2.18 and change the base of this PR to 2.18 as well.

Second: I'd prefer not using exception or Math.addExact() -- can simply check that result is positive number (in this case it cannot roll over to 0). But I think it could/should be extracted as a helper method in a base class (I can help find the space).

Sure, can check after the fact if it is negative and set it to Integer.MAX_VALUE if so. Let me know what base class you had in mind and I can add a function there and update the locations pointer out here to use it. I'll get it updated tomorrow morning and push.

As an aside, I'll be unavailable Oct 24th to Oct 29th. I can continue addressing any additional comments when I return on the 30th, but feel free to finish it up if there is some urgency to do so.

cowtowncoder commented 1 month ago

Thank you @adamjshook ! I'll see if I can play with this PR wrt base class/method to use (or if shared method even makes sense).

ShemTovYosef commented 1 month ago

Hi @adamjshook, we have started to see similar behavior since the upgrade to version 2.17.2. Is there a way to work around this issue programmatically until the fix is merged?

pjfanning commented 1 month ago

Hi @adamjshook, we have started to see similar behavior since the upgrade to version 2.17.2. Is there a way to work around this issue programmatically until the fix is merged?

@ShemTovYosef you should provide more details if you need help. If this problem only happened since you upgraded, can't you revert to the old version that you used? Can you tell us that version number so others know?

It still isn't clear if this is an old issue or one that was introduced recently.

pjfanning commented 1 month ago

I created #1352 targeting 2.17 branch because this issue seems to have bad consequences.

pjfanning commented 1 month ago

I tried the test case with Jackson 2.14 and the issue appears there. So this does not appear to a new issue and in particular, does not appear to relate to the recent refactors of jackson-core to support StreamReadConstraints.

pjfanning commented 1 month ago

@adamjshook could you test the change in #1352? - it appears to fix the issue that you reported (in my testing) but it would be good to get it verified by more people

cowtowncoder commented 1 month ago

Yes, this is probably a long-time bug; but due to its nature rare (unlikely to occur). But pretty nasty one if it ever happens. After fix, such case would probably fail due to other problems, but at least not get into busy loop & hog CPU.

pjfanning commented 1 month ago

As @cowtowncoder has pointed out elsewhere, anyone hitting this is not far off hitting an even bigger issue and that is the fact Jackson uses byte arrays internally. This issue appears to only happen if you have strings in the JSON that are approaching this limit (about 2Gb). Even if we fix this, users will soon be risking going over the 2Gb limit and hitting a new array size exception.

I really can't understand how anyone would want such bigs strings in JSON docs. There have got to be better data structures for this.

cowtowncoder commented 1 month ago

I agree w/ @pjfanning that use of 2+ gig String values seems ill-advised.

But just for sake of completeness, on size limits imposed: Jackson does not actually limit things to 2 gigs except where basic JDK types are the limiting factor due to int length markers (namely, String and Java arrays are both limit to 2^31 chars/elements).

So:

  1. Input length not limited when reading using InputStream or Reader (or with Async provider)
  2. Output length is similarly unlimited w/ OutputStream or Writer
  3. Even long JSON String values can be read as content is accumulated using TextBuffer which uses segments -- 2 GB is not the limit -- however, accessing this is trickier as it cannot be accessed as String (... since String does have 2 gig char max).

Wrt (3) one can use:

public int getText(Writer writer) 

which will provide the whole JSON String value, if caller can provide Writer that can build something that can contain longer content than 2 billion chars.

So technically it is possible to handle humongous content. :) Not convenient but possible.

EDIT: Actually... I may need to take that back. Due to MAX_STRING_LENGTH, we now do limit the longest JSON String value to 2 gigs, I think. Hmmh. That was not an intentional added limitation but an oversight.

EDIT/2: Or, does it? In fact, setting maxStringLength to Integer.MAX_VALUE does prevent exception since no length can exceed it.

cowtowncoder commented 1 month ago

Closing this in favor of #1352.

cowtowncoder commented 1 month ago

Thank you for reporting this, suggesting the fix @adamjshook ! I merged #1352 so this will be fixed in:

adamjshook commented 1 month ago

Thank you for reporting this, suggesting the fix @adamjshook !

Thank you for finishing it up!