Open GeorgePap-719 opened 2 years ago
I would be quite interested by this to be able to use it Android Datastore. Can we take inspiration from what's been done in Wire maybe ?
Ofc, feel free to take anything you like.
A vote for this.
One question: why don't you open a pull request or draft pull request for this? it can make reviewing changes easier.
And an immature suggestion: can we get it all multiplatform with Coroutines channels or kotlinx-io? With channels we get suspending/non-blocking streaming support too. kotlinx-io provides the Sink
and Source
APIs though I am not sure how stable it is.
Hey @ShreckYe, what is your use-case for this feature? Can you explain it here, please.
can we get it all multiplatform with Coroutines channels or kotlinx-io? With channels we get suspending/non-blocking streaming support too. kotlinx-io provides the Sink and Source APIs though I am not sure how stable it is.
Usually these kind of low-level APIs are blocking for performance reasons. However, I can provide APIs which return a Sequence
, similar to what Json
format provides.
Hi @GeorgePap-719, thanks for asking. I mainly care about multiplatform support now. Suspending support can be an extra bonus but I don't need it urgently.
Usually these kind of low-level APIs are blocking for performance reasons.
In my opinion this is half valid. If you are only encoding and decoding small messages this can be true, which does happen in most use cases, because the bytes are probably written and read into buffers and one encoding or decoding operation might encode or decode the message as a whole to or from the buffer, not reaching the end involving any IO waiting for the buffer to be reset. However for large messages, these operations will involve IO and waiting. And under asynchronous/non-blocking/reactive architecture, such as Vert.x in my case, reactive/suspending IO does improve throughput.
It's true that a major obstacle here is that if these decoding/encoding functions are made suspend
, the supporting atomic operations and thus the whole Encoder
/Decoder
hierarchy will also be made suspend
, which requires a lot of code to be refactored and I suppose you are already aware of. Kotlin Coroutines is based on CPS transformation, so there is also some overhead. So does improvement brought by reactiveness/suspending justify the sacrifice in overhead in terms of throughput? I don't know. Maybe only implementing and benchmarking it can tell. However, it's worth noting that Netty has implemented ProtobufEncoder and ProtobufDecoder.
I think another advantage of this is that if we support this we also start supporting network streaming directly using a serialization format. To think further (immaturely), if we support Flow
s/Sequence
s in kotlinx.serialization, giving them the same status as List
s, we can just send and receive Flow
s and get a kind of a streaming protocol out of the box with serialization.
On second thought about this, I would say that my main concern is multiplatform support and Coroutines support is not that important at all. Serializing large messages is a rare use case anyway. And it's still arguable that even for a large message in a suspending context, is it faster to serialize it to a ByteArray
first and then send it, or to wrap the sending operation in a SendChannel
and pass it to the "suspend serialize function" which depends on smaller suspend ones? And since throughput is mainly a server-side concern, with JVM Loom / virtual threads gradually becoming stable and Coroutines integrating it, we can just run the blocking APIs on virtual threads and easily get a rough estimate of performance.
So to sum up, I think it would just be nice to use the Source
and Sink
APIs from kotlinx-io.
This probably could also be used by KTOR to support grpc, see: KTOR-1501 for more details.
My initial POC targeted only the jvm. To make it multiplatform, we would have to wait for kotlinx-io
. On the other hand, Json-format supports streaming either through okio
or targeting directly the jvm.
Summary
Currently, there is no API which reads or writes directly from
stream
in protobuf format. Such API makes sense for frameworks, which helps them to avoid decoding streams to an array first or encoding result to an array and then passing it to the stream.Frameworks
Spring actually needs such an API to complete integration with
kotlinx.serialization
, see: Spring-KotlinProtobufEncoder.Besides Spring, it is to my understanding that also other frameworks can benefit from this addition, for example ktor.
Proposal
Message
A prototype shape of API can be similar to ones json format has:
ProtoBuf.encodeToStream
ProtoBuf.decodeFromStream
Implementation
A draft implementation has already been implemented and reviewed. See #2082.
Delimited messages
Since we are taking about streams and protobuf messages, we cannot ignore delimited messages. It is a core technique where we can write multiple messages in the same stream.
It is important to note that, delimited messages are the main reason and goal of this proposal. This is because, frameworks will most likely use/need this technique to write multiple messages in the same steam.
Implementation
SerializedSize:
To support delimited messages there is one thing to address. How to compute the required size for encoding a message without reading all bytes to an array first.
To address this problem we need an API which will compute the required bytes to encode a message. A proof of concept implementation is in #2239.
An early shape of the API is:
ProtoBuf.getOrComputeSerializedSize(serializer: SerializationStrategy<T>, value: T): Int
Encode delimited messages:
A very early implementation is in #2082, but is not using the solution from above yet.
A draft shape of the APIs are:
ProtoBuf.encodeDelimitedMessagesToStream
ProtoBuf.decodeDelimitedMessagesFromStream