NegativeArraySizeException and ArrayIndexOutOfBoundsException in ByteQuadsCanonicalizer

richard-timpson commented 4 months ago

Versions:

Jackson Version: 2.12.7 Java Version: 11

Context:

We operate a large installation of the open source Spinnaker CD platform. Spinnaker uses a complicated json object to manage and facilitate context during a pipeline execution. At our scale this execution json object can be extremely large for certain use cases. There is an endpoint exposed to the Spinnaker UI that will return all of the executions for a given application.

For this endpoint we frequently see a single payload over 2Gb, and sometimes up to 10Gb.

Problem:

Lately we have been seeing at least two different exceptions originate from the ByteQuadsCanonicalizer class. Here are stack traces for both.

NegativeArraySizeException

Error message: "Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is retrofit.RetrofitError: com.fasterxml.jackson.databind.JsonMappingException: -2147481843 (through reference chain: java.util.ArrayList[0])] with root cause" Stack Trace:

java.lang.NegativeArraySizeException: -2147483128
     java.base/java.util.Arrays.copyOf(Arrays.java:3793)
     com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer._appendLongName(ByteQuadsCanonicalizer.java:997)
     com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:866)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2389)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2273)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseLongName(UTF8StreamJsonParser.java:1862)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName2(UTF8StreamJsonParser.java:1844)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1801)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1736)
     com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextFieldName(UTF8StreamJsonParser.java:1043)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:979)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:973)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:973)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapArray(UntypedObjectDeserializer.java:886)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:737)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:973)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
     com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:701)
     com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:355)
     com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
     com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
     com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
     com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4593)
     com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3601)
     retrofit.converter.JacksonConverter.fromBody(JacksonConverter.java:36)
     retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:367)
     retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)
     com.sun.proxy.$Proxy180.getPipelines(Unknown Source)
     com.netflix.spinnaker.gate.services.ExecutionHistoryService.getPipelines(ExecutionHistoryService.groovy:43)

ArrayIndexOutOfBoundsException

Error Message: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is retrofit.RetrofitError: com.fasterxml.jackson.databind.JsonMappingException: arraycopy: last destination index 2153683734 out of bounds for int[2147026828] (through reference chain: java.util.ArrayList[158])] with root cause

Stack Trace:

java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 2153683734 out of bounds for int[2147026828]
   com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer._appendLongName(ByteQuadsCanonicalizer.java:999)
   com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:866)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2389)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseEscapedName(UTF8StreamJsonParser.java:2034)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseLongName(UTF8StreamJsonParser.java:1906)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName2(UTF8StreamJsonParser.java:1844)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1801)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1736)
   com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextFieldName(UTF8StreamJsonParser.java:1043)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:979)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:951)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:973)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapArray(UntypedObjectDeserializer.java:886)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:737)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:973)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:718)
   com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:701)
   com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:355)
   com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
   com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
   com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
   com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4593)
   com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3601)
   retrofit.converter.JacksonConverter.fromBody(JacksonConverter.java:36)
   retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:367)
   retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)
   com.sun.proxy.$Proxy180.getPipelines(Unknown Source)
   com.netflix.spinnaker.gate.services.ExecutionHistoryService.getPipelines(ExecutionHistoryService.groovy:43)

Investigation

Both exceptions are caused by integer overflows that happen in the _appendLongName method. This is the version of _appendLongName in the 2.12 branch

private int _appendLongName(int[] quads, int qlen)
{
    int start = _longNameOffset;

    // note: at this point we must already be shared. But may not have enough space
    if ((start + qlen) > _hashArea.length) {
        // try to increment in reasonable chunks; at least space that we need
        int toAdd = (start + qlen) - _hashArea.length;
        // but at least 1/8 of regular hash area size or 16kB (whichever smaller)
        int minAdd = Math.min(4096, _hashSize);

        int newSize = _hashArea.length + Math.max(toAdd, minAdd);
        _hashArea = Arrays.copyOf(_hashArea, newSize);
    }
    System.arraycopy(quads, 0, _hashArea, start, qlen);
    _longNameOffset += qlen;
    return start;
}

The NegativeArraySizeException is almost certainly being caused by an integer overflow happening on the newSize calculation int newSize = _hashArea.length + Math.max(toAdd, minAdd);. The ArrayIndexOutOfBoundsException is also being caused by an integer overflow that happens on the if ((start + qlen) > _hashArea.length) statement. Because start + qlen overflows, the if block never executes, and the _hashArea = Arrays.copyOf(_hashArea, newSize); copy never happens. Without that copy the _hashArea size is not expanded, and length parameter (qlen) in System.arraycopy creates the ArrayIndexOutOfBoundsException because the _hashArea array isn't big enough to perform the copy.

Reproduction

I was able to reproduce the ArrayIndexOutOfBoundsException with a local unit test. It's sort of a crazy reproduction but I think it's still valid. I'm not sure what exactly determines the qlen size (I know it happens somewhere in the UTF8StreamJsonParser) class, but after some experimentation I realized that for very large json keys the qlen size is something like 1/4 of the key size. So I wrote a little go script that will generate json files that have a single, extremely large, key.

package main

import (
    "flag"
    "fmt"
    "math/rand/v2"
    "os"

    "github.com/google/uuid"
)

func main() {
    funcType := flag.String("type", "", "Function type to execution")
    keySize := flag.Int("keySize", 0, "Size of the json key(s)")
    numFiles := flag.Int("numFiles", 0, "Number of files to generate")
    filePath := flag.String("filePath", "", "The path to write files to")

    flag.Parse()

    if *funcType == "createManyKeys" {
        createManyKeys(*keySize)
    } else if *funcType == "createLargeKey" {
        createLargeKey(*keySize, *numFiles, *filePath)
    } else {
        fmt.Printf("Please supply a valid function")
    }
}

func createLargeKey(keySize int, numFiles int, filePath string) {
    chunkSize := 1024
    for i := 0; i < numFiles; i++ {
        file, err := os.Create(filePath + fmt.Sprintf("large_key%d.json", i))
        if err != nil {
            fmt.Println("Error creating file:", err)
            return
        }
        defer file.Close()

        // Start the JSON object
        file.WriteString("{\"")

        for i := 0; i < keySize/chunkSize; i++ {
            randomChar := 'A' + rand.IntN(26)
            chars := make([]rune, chunkSize)
            for i := range chars {
                chars[i] = rune(randomChar)
            }
            _, err := file.WriteString(string(chars))
            if err != nil {
                fmt.Errorf("Error writing chars to file", err)
            }
        }

        // Close the JSON object
        file.WriteString("\": \"\"}")

        fmt.Printf("JSON file created successfully with one key of size %d \n", keySize)
    }
}

// another function I wrote to experiment adding a bunch of unique keys (UUIDs) to a single json file. This type of json document did not reproduce any bugs
func createManyKeys(keySize int) {
    file, err := os.Create("large.json")
    if err != nil {
        fmt.Println("Error creating file:", err)
        return
    }
    defer file.Close()

    // Calculate the number of keys required to reach approximately 2GB of key data
    // totalSize := 2 * 1024 // 2GB in bytes
    totalSize := 2 * 1024 * 1024 * 1024 // 2GB in bytes
    keysNeeded := totalSize / keySize

    // Start the JSON object
    file.WriteString("{")

    for i := 0; i < keysNeeded; i++ {
        // Generate a UUID for the key
        uuidKey := uuid.New().String()
        key := fmt.Sprintf("\"%s\": \"\"", uuidKey)
        if i < keysNeeded-1 {
            key += ","
        }
        file.WriteString(key)
    }

    // Close the JSON object
    file.WriteString("}")

    fmt.Printf("JSON file created successfully with %d keys\n", keysNeeded)
}

If you execute the script as go run main.go -type createLargeKey -keySize 3000000000 -numFiles 5 -filePath /path/to/java/test/resources/ it will create 5 files with a key that is approximately 3Gb.

And then the following unit test will produce a ArrayIndexOutOfBoundsException exception

@Test
  public void testBug() throws Exception {
    JsonFactory jsonFactory = new JsonFactory();
    // If the CANONICALIZE_FIELD_NAMES flag is disabled, the test will not reproduce
//    jsonFactory.disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES);

    // New role providers break deserialization if this is not enabled.
    ObjectMapper objectMapper = new ObjectMapper(jsonFactory)
      .enable(DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_AS_NULL)
      .disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
      .registerModule(new JavaTimeModule());

    ClassLoader classLoader = this.getClass().getClassLoader();
    InputStream inputStream= classLoader.getResourceAsStream("large_key0.json");
    Map<String, String> parsedObject = objectMapper.readValue(inputStream, Map.class);
    System.out.println("Parsed object with " + parsedObject.size() + " keys");

    inputStream= classLoader.getResourceAsStream("large_key1.json");
    parsedObject = objectMapper.readValue(inputStream, Map.class);
    System.out.println("Parsed object with " + parsedObject.size() + " keys");

    inputStream= classLoader.getResourceAsStream("large_key2.json");
    parsedObject = objectMapper.readValue(inputStream, Map.class);
    System.out.println("Parsed object with " + parsedObject.size() + " keys");

    inputStream= classLoader.getResourceAsStream("large_key3.json");
    parsedObject = objectMapper.readValue(inputStream, Map.class);
    System.out.println("Parsed object with " + parsedObject.size() + " keys");

    inputStream= classLoader.getResourceAsStream("large_key4.json");
    parsedObject = objectMapper.readValue(inputStream, Map.class);
    System.out.println("Parsed object with " + parsedObject.size() + " keys");
  }

The specific error in this case is java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 2250000128 out of bounds for int[1500000256].

Explanation?

There have been several other issues in jackson-core that have revealed similar bugs and some with explanations on the point of ByteQuadsCanoniclizer and what it is doing. Some references:

As far as I can tell none of these issues or fixes target this specific error, as it seems to be caused only in the part of the _hashArea that handles large json keys. Based on comments in other threads (specifically this one), the ByteQuadsCanonicalizer class doesn't really do much in cases where there are many unique json keys. And based on these errors, it seems that there are also possibly design limitations for keys that are both very large and unique?

I haven't performed an analysis on what the json we return looks like (it's quite difficult with payloads that are so large) but I am guessing that somewhere in Spinnaker executions are being created with very many large json keys. I'm hoping to get some guidance on whether this assumption is correct or not.

Mitigation

As mentioned in the reproduction code, using jsonFactory.disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES); makes the problem go away by disabling the problem class. We are planning on using this as our mitigation to avoid the bug, although based on other issues we are concerned about possible performance hits that this will come with. Given that huge amount of json data processing that we do, you can imagine that we are always looking to squeeze out json parsing otpimizations where we can.

So our plan for now looks something like.

Disable CANONICALIZE_FIELD_NAMES (hopefully temporarily) to avoid the bug
Run performance tests/benchmarks that are specific to our workload to get a sense of the performance impact by disabling
Monitor our resource usage in our production environments as the change rolls out
Analyze the structure of the json we return to see if there are specific json keys that may be causing the problem, and try to remove them.
If the performance hits from disabling become problematic, we can revisit re-enabling after either reducing the json size, or hopefully upgrading to a new jackson library version (that may not exist yet) that fixes the issue.

I wanted to put up the issue for awareness because, although I will admit it is caused by a crazy json workload, this does seem like an actual bug.

I also was hoping to get more understanding as to the actual cause. I was able to reproduce with 3Gb json keys, but there's no way we have keys that large in our real workload. The one specific thing I am trying to understand is how the class handles rehashing and resizing. There is a bit of code in the class that should be resetting the _longNameOffset size when rehashing takes place.

private void nukeSymbols(boolean fill) {
    _count = 0;
    // reset spill-over to empty (starting at 7/8 of hash area)
    _spilloverEnd = _spilloverStart();
    // and long name area to empty, starting immediately after hash area
    _longNameOffset = _hashSize << 3;
    if (fill) {
        Arrays.fill(_hashArea, 0);
        Arrays.fill(_names, null);
    }
}

So I'm guessing that the overflows are being caused by some weird combination of

Not enough rehashing/resizing
Many many large json keys that are triggering the _appendLongName
Those large json keys are always unique

Some confirmation here would be helpful.

Next Steps

If I get time I'm hoping to run some more experimentation to see if I can reproduce with smaller json keys. I have also only run these tests against the 2.12.7 version. I'd like to also run them against the latest 2.18 version to verify it's still a problem. I have looked at the latest version of the code however and don't see any changes to suggest that there is a fix.

pjfanning commented 4 months ago

Please try another version of Jackson. 2.17.1 was just released.

More recent versions of Jackson have this https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-core/latest/com/fasterxml/jackson/core/StreamReadConstraints.html

And that should hopefully mean that Jackson will fail fast with very large strings.

cowtowncoder commented 4 months ago

@richard-timpson Thank you for reporting this! As @pjfanning suggested, we would need a reproduction for a later version as there won't be new releases of 2.12.x; so ideally at least 2.16.x. This may be challenging for reproduction due to StreamReadConstraints as mentioned, as with 2.16 maximum length (in input units) is being verified and limit enforced -- default value is way below megabyte (50_000).

But if reproduction can be shown to work with relaxed (but not insanely so) limits -- say, max name legth of 1 meg? -- it could still be considered a valid flaw.

richard-timpson commented 4 months ago

This may be challenging for reproduction due to StreamReadConstraints as mentioned, as with 2.16 maximum length (in input units) is being verified and limit enforced -- default value is way below megabyte (50_000).

But if reproduction can be shown to work with relaxed (but not insanely so) limits -- say, max name legth of 1 meg? -- it could still be considered a valid flaw.

Thanks for the information here. I've got to get the code in to disable CANONICALIZE_FIELD_NAMES today. Hoping that I can do more investigation with other reproduction tactics this week.

richard-timpson commented 4 months ago

@cowtowncoder Am I right to expect that we'll see a significant performance hit by disabling? I'm guessing that the canonicalization is doing a lot of good for the smaller json keys (which I know we have in our json objects). Ofc there are memory issues for the larger keys and possibly memory overhead (a lot of array copying) that might slow parsing down for those keys.

So I guess the answer is probably dependent on what the structure of our json looks like. I'm hoping to also run a performance benchmark on our workload to compare the two, not sure if I'll have extra time to get to that. Any insight here is much appreciated :)

cowtowncoder commented 4 months ago

@richard-timpson Yes you are correct in your assessments: it does depend on whether names repeat across parsing -- not just within a single document but across documents being parsed using parsers created by same JsonFactory (at least when parsers are closed either explicitly, or by encountering end-of-content).

Performance benefit comes from not only avoiding String (and underlying char[]) allocations, but also in by-passing most of UTF-8 decoding (byte[]->char[]).

In cases where there are tons of non-repeating names (UUID as key) things won't work that well; if so it may make sense to disable canonicalization.

Obviously performance benchmark would be useful, but it can take some effort to do that. Unfortunately there isn't much exposed diagnostics from ByteQuadsCanonicalizer (wrt number of entries) -- but there is some (in later versions at least). Printing those out after processing some set of sample documents could give some insights on whether canonicalization helps (mostly number of distinct symbols canonicalized).

richard-timpson commented 3 months ago

It's been a few weeks and I wanted to provide an update.

We rolled out the change to disable CANONICALIZE_FIELD_NAMES in the part of our service that was breaking. Since then we've seen the bugs/stack traces have gone away (as expected). Both our memory and CPU usage metrics have also been perfectly stable since, so for our purposes there hasn't been a measurable adverse performance effect from the change. It is possible that the one API interaction that was breaking does perform worse, but we haven't performed any benchmarks to isolate that interaction.

Additionally, we have another system that depends on Spinnaker execution context payloads. That service performed an upgrade from Jackson v2.15 -> v2.17 and saw the following error. JsonMappingException: Name length (247141) exceeds the maximum allowed (200000.

That was super helpful as it allowed us to identify which json keys were large. The other service increased the configurable max size and hasn't seen the issue since. For now we have no plan to reduce the json key size, but can focus our efforts there if it creeps up again.

But if reproduction can be shown to work with relaxed (but not insanely so) limits -- say, max name legth of 1 meg? -- it could still be considered a valid flaw.

There is still an open question about the bug still existing even with the size limits introduced in later versions. I doubt that we'll have the time to work on a reproduction. For our purposes we have a path forward, and we can revisit if the issue comes up again.

cowtowncoder commented 3 months ago

Thank you for sharing @richard-timpson -- this is useful information, in case issue resurfaces.

FasterXML / jackson-core

NegativeArraySizeException and ArrayIndexOutOfBoundsException in ByteQuadsCanonicalizer #1289

Versions:

Context:

Problem:

NegativeArraySizeException

ArrayIndexOutOfBoundsException

Investigation

Reproduction

Explanation?

Mitigation

Next Steps