FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
306 stars 129 forks source link

CBOR: negative BigInteger values not handled correctly #431

Open bgruber opened 7 months ago

bgruber commented 7 months ago

Hi! I was doing some work with the CBOR part of the library, and I found that it didn't decode or encode negative BigIntegers as I understand the spec. For example:

var negOne = BigInteger.ONE.negate();
var cborMapper = new ObjectMapper(new CBORFactory());
var bytes = cborMapper.writeValueAsBytes(negOne);
System.out.println(org.apache.commons.codec.binary.Hex.encode(bytes));

This prints out c34101, but per this CBOR playground, that's CBOR for a bigint of -2, not -1.

Similarly on the read side:

    var cborMapper = new ObjectMapper(new CBORFactory());
    var bytes =
        new byte[] {
          (byte) 0xC3,
          (byte) 0x50,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF,
          (byte) 0xFF
        };
    var read = cborMapper.readValue(bytes, BigInteger.class);
    System.out.println(read);

Prints 1, instead of -340282366920938463463374607431768211456 (-2^128). CBOR playground

I think the issue is that the library is constructing the BigIntegers from the byte string, and then negating it, instead of taking the complement of the byte string (or "-1 - n" as stated by the RFC) and constructing the BigInteger from that.

Thanks for all your work on jackson!

cowtowncoder commented 7 months ago

That sounds plausible to me. Thank you for reporting this @bgruber.

But aside from confirming the case and proposing a fix, the tricky part is compatibility. It is unfortunate that generation produces wrong value (essentially corrupt encoding); if this was only for decoding there'd be bit less problem. I suspect it will be necessary to add CBORGenerator.Feature and CBORParser.Features to control handling, and possibly even default these to existing incorrect behavior, given that format backend is used in production and so just changes can be pretty nasty.

cowtowncoder commented 1 month ago

I think I could use help here if anyone has time: I can help in configuration part, but would need a simple(st) reproduction for a small negative number (easiest to verify, handling not dependent on magnitude).