FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Some short UTF Strings encoded using non-canonical form #159

Closed Sajjon closed 5 years ago

Sajjon commented 5 years ago

Hey!

I recently discovered that Jackson produces UTF strings on the non-shortest-possible format (non-canonical form ) sometimes.

I first reported this issue CBOR Github but realized that most other CBOR project is using canonical form where Jackson is not.

It would be great if you could support canonical encoding of strings :)

cabo commented 5 years ago

Note that this bug report is not about the missing feature of generating "canonical" encoding (which would be a duplicate of #138), but about a set of off-by-one errors in the basic encoding logic which seem to lead to valid, but unnecessarily long encoding for ASCII strings of length 23 and 255. Please see the referenced issue for details. I'm not claiming to have read the entirety of CBORGenerator.java, but it seems that Line 1251 is not the only place that has this off-by-one error (maybe search for lines with ...MAX... that have < instead of <=).

cowtowncoder commented 5 years ago

@Sajjon thank you for reporting this. @cabo thank you for clarification: that does make more sense. I'll update the title.

cowtowncoder commented 5 years ago

Added this on

https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress

and hopefully include in 2.10 (depending on fix, could consider 2.9 backport but bit worried that change not safe for patch as 2.9 is quite far along patch series -- while correction, change to encoded data length can probably trigger bogus test fails in downstream deps).

Sajjon commented 5 years ago

@cowtowncoder @cabo Great job guys! Thanks a lot for your prompt reply. Looking forward to the fix!

cowtowncoder commented 5 years ago

Hoping to get 2.9.9 out Real Soon Now -- just received a potential CVE, which I'll need to address, but after that.

Sajjon commented 5 years ago

Great job! 💪

cowtowncoder commented 5 years ago

@Sajjon thank you once again for reporting this -- we take correctness and interoperability seriously, so it's good to weed out these deviations at least eventually over time :)