FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

`ArrayIndexOutOfBoundsException` in `CBORParser` for invalid UTF-8 String #236

Closed fmeum closed 3 years ago

fmeum commented 3 years ago

The following Java snippet crashes with an ArrayIndexOutOfBoundsException in CBORParser._finishShortText:

import java.io.IOException;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.cbor.CBORFactory;

public class JacksonCborCrash {
    public static void main(String[] args) {
        byte[] input = {0x66, (byte) 0xef, 0x7d, 0x7d, 0xa, 0x2d, (byte) 0xda};
        CBORFactory factory = new CBORFactory();
        ObjectMapper mapper = new ObjectMapper(factory);
        try {
            mapper.readTree(input);
        } catch (IOException e) {}
    }
}

The stack trace with version 2.12.1 is:

java.lang.ArrayIndexOutOfBoundsException: Index 7 out of bounds for length 7                                                                                                                                                          
        at com.fasterxml.jackson.dataformat.cbor.CBORParser._finishShortText(CBORParser.java:2203)                                                                                                                                    
        at com.fasterxml.jackson.dataformat.cbor.CBORParser._finishTextToken(CBORParser.java:2170)                                                                                                                                    
        at com.fasterxml.jackson.dataformat.cbor.CBORParser.getText(CBORParser.java:1530)                                                                                                                                              
        at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeAny(JsonNodeDeserializer.java:545)                                                                                                                
        at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:74)                                                                                                                    
        at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:16)                                                                                                                    
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)                                                                                                    
        at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4635)                                                                                                                                      
        at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3056)

This issue appears to be caused by missing bounds checks in the cases of this switch statement.

cowtowncoder commented 3 years ago

@fmeum Thank you for reporting this, I'll need to have a look.

cowtowncoder commented 3 years ago

Ah. So this is broken encoding; the last byte is broken initial byte of 2-byte UTF-8 character, which is why illegal access is made. Should be caught and reported of course (unexpected end of content), will need to see what's the easiest way.

cowtowncoder commented 3 years ago

Fixed for 2.12.2; 2 problems:

  1. Not verifying buffer boundary, matters for the specific case if the last byte implies existence one or more bytes of multi-byte UTF-8 character
  2. But also should verify UTF-8 encoding goodness: example case is broken (maybe it's Latin-1?); should fail earlier

Slightly worried about (2) in a patch release (and hence no backport for 2.11) since while validation really should be done, no doubt some content exist where "it used to 'work'" (i.e. butchered occasionally mis-encoded character but no one noticed, or some validation removed that garbage later on), but we'll see. Will also file a follow-up issue for 2.13 since as of now validation of UTF-8 characters is inconsistent across code paths -- it shouldn't be, but it is. For 2.13 we can make things more strict more generally.