Message serializers - Githubissues

jroper commented 6 years ago

Migrated from https://github.com/eclipse/microprofile-reactive-streams/issues/42

Comment by @jroper: We need a serialization abstraction for messaging.

Comment by @olegz I am assuming serialization here implies to/from universal "wire" format such as byte[]. Correct? I want to make sure that there is a distinction between general type conversion (i.e., payload conversion to a type requested by a handler)

Comment by @jroper: Serialization is the conversion of a payload to/from the type requested by the handler.

I don't think we need to have a universal wire format to do this (eg, one messaging provider may provide payloads as bytes, another as strings, and another as a JSON like tree structure), we may want to make the serializer abstraction flexible enough to support whatever the underlying messaging provider can or does offer. Obviously bytes will be common, but the problem with bytes is that some messaging providers support string, and a string can't be represented as bytes alone, it can only be represented as bytes + charset, so it would be better to offer direct deserialization from/serialization to strings and allow the messaging provider to handle encoding to bytes however it wants.

Comment by @olegz: James, IMHO there is a clear distinction between type serialization and type conversion. While they may look alike, semantically they are radically different. (De)Serialization implies reading from or writing to some type of storage or transport format which can only be byte[]. Even the systems you are referring to that deal with Strings simply means they have some internal mechanism to deal with byte[]. But they are also responsible to infer the charset.

On the other hand type conversion simply means transform payload of a message from whatever it currently is to whatever type required by a handler operation (i.e., from Foo to Bar). Sure, the same converters can deal with converting to/from wire format, but that is implementation detail. All I want to communicate is that I personally draw a clear distinction between SerDe and type conversion.

Comment by @jroper: Ok, so if we're talking about type conversion, would should we name the type converters? Are there examples of APIs that we can model the naming off?

Comment by @olegz: Yes, just as an example, in Spring we have (MessageConverter)[https://github.com/spring-projects/spring-framework/blob/master/spring-messaging/src/main/java/org/springframework/messaging/converter/MessageConverter.java] abstraction. Yes we do use it for both cases (to/from byte[] as well as other types), but that is the implementation detail.

Semantically when I hear serialization I hear storage and/or transport and personally I would love to see the distinction to be more clear.

public interface MessageSerializer {

    byte[] fromMessage(Message<?> message);

    <T> Message<T> toMessage(byte[] rawData);
}

You can also see similar approach in Kafka with the exception that they've separated Serializer and Deserializer

Comment by @jroper: Thanks for the links. I like the shape of the MessageConverter API, I was thinking something with the same method signatures.

Do you have any thoughts on separating the two directions? My thought on this is if you are only consuming a message of a certain type, and not producing it, then there's no need to write the conversion for both directions (of course, for the most part it'll be handled automatically, in the MicroProfile case by JSON-B by default, but even for other formats, eg protobuf, it can be defined generically and so doesn't need users to implement their own converters, rather they can just reference out of the box or converters provided by third party libraries). If we were to separating them, then the name Converter probably won't work, since it doesn't imply a direction and doesn't have an opposite. In that case, perhaps marshall/unmarshall? This is consistent with JAXB terminology. bind/unbind could also be used, but that usually implies some existing object/tree structure like JSON that you're binding your objects to/from, which in some cases might be the case, but in most cases, there's going to be a parse/format stage before binding/unbinding is done.

On the difference between type conversion and serialization, in your example code above, it looks like message headers etc are included in the output bytes/parsed from the input bytes, is that right? Because if that's the case then Kafka's serializers are actually type converters that just happen to always work with byte[], because they don't expect you to parse/format the headers onto the wire.

Last thing, any opinion on byte[] vs ByteBuffer? The latter has optimizations over byte[] and offers a small amount of extra safety in that it can be read only.

Comment by @olegz: "doesn't need users to implement their own converters. . ." - that is pretty much the thinking. Also with Java 8 default it would be easy to NOT force user to implement what doesn't have to be implemented.

With regard to separating serializer vs type-converters . . . let's just say that Serializers are specialized type-converters that always deal with to/from byte[] type, so there may be some API simplifications.

And yes Kafka would be a good analogy of type-converters that always deal with to/from byte[]. Yes Kafka now provides native support for headers. In previous versions of Kafka we've implemented our own way of embedding headers, which users/systems can choose to implement. But the idea is that headers should be simple key/values or primitive types, thus easily embeddable and extracted into a Message by the serializers.

With regard to byte[] vs ByteBuffer I am ok with ByteBuffer primarily for 'read only' reasons.

cescoffier commented 6 years ago

+1 for ByteBuffer.

otaviojava commented 5 years ago

+1 for ByteBuffer.

otaviojava commented 5 years ago

Given an entity such as Person and my message is a JSON binary, how to deserialize information without defining a class such as Person.class?

Maybe, a method that passes a class such as:

public interface MessageSerializer {

    byte[] fromMessage(Message<?> message);

    <T> Message<T> toMessage(byte[] rawData);

    <T> Message<T> toMessage(byte[] rawData, Class<T> entityClass);
}

hutchig commented 5 years ago

@Azquelt - lets use this issue for type conversion, we have pushed standardizing it out to 1.1. I would love to integrate with MicroProfile Config type converters if possible.

aguibert commented 5 years ago

+1 to what @otaviojava suggested with the toMessage(byte[] rawData, Class<T> entityClass) method. In order to let this integrate with databinding frameworks (e.g. JSON-B or Jackson) the framework needs to know what Java type to convert the data into. I think we need something equivalent to MessageBodyReader and MessageBodyWriter from JAX-RS

eclipse / microprofile-reactive-messaging

Message serializers #6