Kotlin / kotlinx.serialization

Kotlin multiplatform / multi-format serialization
Apache License 2.0
5.36k stars 619 forks source link

Support I/O stream for protobuf format #2075

Open GeorgePap-719 opened 1 year ago

GeorgePap-719 commented 1 year ago

Summary

Currently, there is no API which reads or writes directly from stream in protobuf format. Such API makes sense for frameworks, which helps them to avoid decoding streams to an array first or encoding result to an array and then passing it to the stream.

Frameworks

Spring actually needs such an API to complete integration with kotlinx.serialization, see: Spring-KotlinProtobufEncoder.

Besides Spring, it is to my understanding that also other frameworks can benefit from this addition, for example ktor.

Proposal

Message

A prototype shape of API can be similar to ones json format has:

ProtoBuf.encodeToStream

ProtoBuf.decodeFromStream

Implementation

A draft implementation has already been implemented and reviewed. See #2082.

Delimited messages

Since we are taking about streams and protobuf messages, we cannot ignore delimited messages. It is a core technique where we can write multiple messages in the same stream.

It is important to note that, delimited messages are the main reason and goal of this proposal. This is because, frameworks will most likely use/need this technique to write multiple messages in the same steam.

Implementation

SerializedSize:

To support delimited messages there is one thing to address. How to compute the required size for encoding a message without reading all bytes to an array first.

To address this problem we need an API which will compute the required bytes to encode a message. A proof of concept implementation is in #2239.

An early shape of the API is:

ProtoBuf.getOrComputeSerializedSize(serializer: SerializationStrategy<T>, value: T): Int

Encode delimited messages:

A very early implementation is in #2082, but is not using the solution from above yet.

A draft shape of the APIs are:

ProtoBuf.encodeDelimitedMessagesToStream

ProtoBuf.decodeDelimitedMessagesFromStream

bishiboosh commented 11 months ago

I would be quite interested by this to be able to use it Android Datastore. Can we take inspiration from what's been done in Wire maybe ?

GeorgePap-719 commented 11 months ago

Ofc, feel free to take anything you like.

ShreckYe commented 7 months ago

A vote for this.

One question: why don't you open a pull request or draft pull request for this? it can make reviewing changes easier.

And an immature suggestion: can we get it all multiplatform with Coroutines channels or kotlinx-io? With channels we get suspending/non-blocking streaming support too. kotlinx-io provides the Sink and Source APIs though I am not sure how stable it is.

GeorgePap-719 commented 7 months ago

Hey @ShreckYe, what is your use-case for this feature? Can you explain it here, please.

can we get it all multiplatform with Coroutines channels or kotlinx-io? With channels we get suspending/non-blocking streaming support too. kotlinx-io provides the Sink and Source APIs though I am not sure how stable it is.

Usually these kind of low-level APIs are blocking for performance reasons. However, I can provide APIs which return a Sequence, similar to what Json format provides.

ShreckYe commented 7 months ago

Hi @GeorgePap-719, thanks for asking. I mainly care about multiplatform support now. Suspending support can be an extra bonus but I don't need it urgently.

Usually these kind of low-level APIs are blocking for performance reasons.

In my opinion this is half valid. If you are only encoding and decoding small messages this can be true, which does happen in most use cases, because the bytes are probably written and read into buffers and one encoding or decoding operation might encode or decode the message as a whole to or from the buffer, not reaching the end involving any IO waiting for the buffer to be reset. However for large messages, these operations will involve IO and waiting. And under asynchronous/non-blocking/reactive architecture, such as Vert.x in my case, reactive/suspending IO does improve throughput.

It's true that a major obstacle here is that if these decoding/encoding functions are made suspend, the supporting atomic operations and thus the whole Encoder/Decoder hierarchy will also be made suspend, which requires a lot of code to be refactored and I suppose you are already aware of. Kotlin Coroutines is based on CPS transformation, so there is also some overhead. So does improvement brought by reactiveness/suspending justify the sacrifice in overhead in terms of throughput? I don't know. Maybe only implementing and benchmarking it can tell. However, it's worth noting that Netty has implemented ProtobufEncoder and ProtobufDecoder.

I think another advantage of this is that if we support this we also start supporting network streaming directly using a serialization format. To think further (immaturely), if we support Flows/Sequences in kotlinx.serialization, giving them the same status as Lists, we can just send and receive Flows and get a kind of a streaming protocol out of the box with serialization.

ShreckYe commented 7 months ago

On second thought about this, I would say that my main concern is multiplatform support and Coroutines support is not that important at all. Serializing large messages is a rare use case anyway. And it's still arguable that even for a large message in a suspending context, is it faster to serialize it to a ByteArray first and then send it, or to wrap the sending operation in a SendChannel and pass it to the "suspend serialize function" which depends on smaller suspend ones? And since throughput is mainly a server-side concern, with JVM Loom / virtual threads gradually becoming stable and Coroutines integrating it, we can just run the blocking APIs on virtual threads and easily get a rough estimate of performance.

So to sum up, I think it would just be nice to use the Source and Sink APIs from kotlinx-io.

GeorgePap-719 commented 6 months ago

This probably could also be used by KTOR to support grpc, see: KTOR-1501 for more details.

My initial POC targeted only the jvm. To make it multiplatform, we would have to wait for kotlinx-io. On the other hand, Json-format supports streaming either through okio or targeting directly the jvm.