FasterXML / jackson-databind

General data-binding package for Jackson (2.x): works on streaming API (core) implementation(s)
Apache License 2.0
3.52k stars 1.38k forks source link

`JsonGenerator` cannot write binary-valued property names #1553

Open nmatt opened 7 years ago

nmatt commented 7 years ago

When attempting to use writeBinary() for a property name, the following exception is thrown:

com.fasterxml.jackson.core.JsonGenerationException: Can not write a binary value, expecting field name (context: Object)
    at com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:1897)
    at com.fasterxml.jackson.core.json.JsonGeneratorImpl._reportCantWriteValueExpectName(JsonGeneratorImpl.java:244)
    at com.fasterxml.jackson.core.json.JsonGeneratorImpl._verifyPrettyValueWrite(JsonGeneratorImpl.java:234)
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._verifyValueWrite(UTF8JsonGenerator.java:1004)
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeBinary(UTF8JsonGenerator.java:763)
    at com.fasterxml.jackson.core.JsonGenerator.writeBinary(JsonGenerator.java:1144)

There should probably exist a method writeBinaryFieldName(byte[]) (roughly similar to writeFieldId(long)), as byte arrays serialize to a base64 string, which is perfectly valid for property names in JSON objects.

This error occured when trying to use a custom key serializer to work around issue #1552.

As a workaround, writeFieldName(Base64Variants.getDefaultVariant().encode(value, false)) can be used. However this is only incidentally safe, as the default base64 variant does not contain newlines. If it would, then they would be doubly escaped, once by Base64Variant#encode(), and once by writeFieldName(). Hence this solution is not entirely clean.

cowtowncoder commented 7 years ago

Ok, this would really belong in jackson-core repo, not databind, but...

writeBinary() is meant only for values, not keys. So exception is correct wrt design, similar to how writeText() can not be used instead of writeFieldName() (whether separation makes sense or is optimal is different question; but that's how API has been designed).

But beyond that there is no intent to support writing of arbitrary and untrusted bytes as names: not all backends would support it (for example, when Writer is used) and this would by-pass escaping.

It is doable, however, but only by implementing SerializableString: there is writeFieldName(SerializableString) which will then append name as expected.

What I am curious about here is the use case -- what is the intent? Does byte array contain valid properly escape JSON name? Or is just opaque binary data which would require escaping?

nmatt commented 7 years ago

Ah, sorry about the wrong repo. WriteBinary() does not write arbitrary/untrusted bytes (that would be writeRawValue()), instead it encodes the bytes as base64, which results in a pretty safe name (e.g. "YW55IGNhcm5hbCBwbGVhc3U="). The only characters that can occur in the default base64 encoding are ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=.

As I mentioned in the initial comment, the use case is for a workaround to issue #1552, where I have values that are indexed by binary keys. In the world of ASN.1 (e.g. X.509 certificates and electronic signatures), keys are commonly binary values (sequences of arbitrary bytes), and it makes sense to have them as keys (names) in JSON objects after encoding them as a character string in some way (base64, in this case). As Jackson already encodes byte arrays as base64 by default when they occur as values, it would be useful, intuitive and safe to also do so for names.

cowtowncoder commented 7 years ago

@nmatt Ah! Thank you for clarification. I did misunderstand your intent.

So, essentially this would be required for, say, this kind of POJO:

class BinaryValues {
   public Map<byte[],byte[]> data;
}

which would use Base64 encoding for textual formats, as you say, but possibly native byte arrays for formats that support this (CBOR perhaps)?

I think this makes sense, although then the question would become that of how to expose that via API, and whether it should be something handled at streaming api layer (JsonParser, JsonGenerator) or at databind. Problem with streaming is similar to that of binary values in that many textual formats do not have a way to reliably indicate difference, so that databind layer has to instruct it either partially (like with binary value reads/writes), or completely implement it. Benefit of streaming api implementing part is that then binary formats can actually avoid encoding, which would not happen if it was all implemented on databind layer.

So I am open to supporting this in some form, but not yet clear on what would be a good way.

cowtowncoder commented 7 years ago

I can see why byte[] valued Maps would be valuable as native thing, for binary formats; and then "emulated" using Base64 for textual. But doing that will require bit more thought, work, so probably Jackson 3.0.