FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Add `CBORGenerator.Feature.LENIENT_UTF_ENCODING` for lenient handling of Unicode surrogate pairs on writing #222

Closed guillaumebort closed 3 years ago

guillaumebort commented 4 years ago

If enabled, the generator will output the Unicode Replacement Character for invalid unicode sequence (invalid surrogate chars in the Java String) instead of failing with an IllegalArgumentException.

Also this PR remove the code duplication between _shortUTF8Encode2 and _encode2.

cowtowncoder commented 4 years ago

Ok, first of all, thank you for the contribution! I am bit concerned about the fact that some code is trying to output invalid Unicode content, essentially, but as long as that is documented and user has to explicitly enable said feature I think that is fine.

I added some small notes in PR itself, but 2 bigger questions:

  1. Since "master" is for 3.0, which might be far out, you may want to instead make PR against 2.12. I can handle merging it forward to master (wrt API changes)
  2. I'll need to have a look but my main concern is with performance: since this is an edge case, it should have no measurable effect on good case (i.e. content does not have invalid surrogate characters). I can probably test it myself but thought I'll mention it -- I think changes do affect main loops.
guillaumebort commented 4 years ago
  1. Since "master" is for 3.0, which might be far out, you may want to instead make PR against 2.12. I can handle merging it forward to master (wrt API changes)

Ok fair enough I will base the PR on 2.12

cowtowncoder commented 3 years ago

@guillaumebort Apologies for slow follow-up here: now back to getting this merge for 2.12. One thing I'd need, if I hadn't yet asked would be CLA. It's here:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and usually easiest to print, fill & sign, scan/photo, email to info at fasterxml dot com. Only needs to be done once before the first contribution. Apologies if we already got one and I somehow missed it.

guillaumebort commented 3 years ago

One thing I'd need, if I hadn't yet asked would be CLA. It's here:

Thanks! I need to handle that with the legal team at my company and I do it ASAP.

cowtowncoder commented 3 years ago

Sounds good -- we have both individual CLA that I linked earlier (and used by most contributors), as well as Corporate CLA (CCLA), at

https://github.com/FasterXML/jackson/blob/master/contributor-agreement-corporate.txt

if that makes more sense.

guillaumebort commented 3 years ago

Should be ok now, my company (Datadog) sent a a signed copy of the Corporate CLA over.

cowtowncoder commented 3 years ago

@guillaumebort For some reason I don't see one yet? I assume it'd be sent to info@fasterxml.com?

cowtowncoder commented 3 years ago

Received the CLA.

cowtowncoder commented 3 years ago

Ended up merging this manually, hence closing PR, feature is in, tests, will be included in 2.12.0-rc2. Thank you again for contributing this! Might make sense to add similar support for Smile, and perhaps other backends too.