FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

Add `SmileGenerator.Feature.LENIENT_UTF_ENCODING` for lenient handling of broken Unicode surrogate pairs on writing #276

Closed kireet closed 3 years ago

kireet commented 3 years ago

When encoding some invalid user generated data, we encounter JsonGenerationExceptions with the message Unmatched first part of surrogate pair. This didn't occur when using text serialization.

It seems there's a CBOR option to avoid this exception, CBORGenerator.Feature.LENIENT_UTF_ENCODING, but not one for Smile. It would be great to add this feature.

To reproduce, just try to serialize an invalid string, such as "\uD83D".

cowtowncoder commented 3 years ago

Sounds like a good idea, hoping to implement when I have time!

Note: was done for CBOR via #222.