K77 / protobuf-net

Automatically exported from code.google.com/p/protobuf-net
Other
0 stars 0 forks source link

Await/async proposal #444

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I am proposing very simple async-enabled serialization methods that basically 
wrap MemoryStream. Below is a basic proposed implementation:

public static Task SerializeWithLengthPrefixAsync<T>(
    Stream stream, T instance, PrefixStyle prefix,
    CancellationToken cancellation)
{
    var buffer = new MemoryStream();
    Serializer.SerializeWithLengthPrefix(buffer, instance, prefix);
    buffer.Close();
    var serialized = buffer.ToArray();
    return stream.WriteAsync(
        serialized, 0, serialized.Length, cancellation);
}

public static async Task<T> DeserializeWithLengthPrefixAsync<T>(
    Stream stream, PrefixStyle prefix, CancellationToken cancellation)
{
    if (prefix != PrefixStyle.Base128)
        throw new NotSupportedException();
    byte[] lengthBuf = new byte[1];
    int length = 0, shift = 0;
    do
    {
        if (shift >= 28)
            throw new OverflowException();
        await stream.ReadFixedAsync(lengthBuf, cancellation);
        length |= (lengthBuf[0] & 0x7f) << shift;
        shift += 7;
    } while ((lengthBuf[0] & 0x80) != 0);
    var buffer = new byte[length];
    // ReadAllAsync helper keeps reading until the buffer is filled
    await ReadAllAsync(stream, buffer, cancellation);
    return Serializer.Deserialize<T>(new MemoryStream(buffer));
}

This code could use some improvements WRT readability, generality, and 
performance, but even this basic implementation is faster and easier to use 
than the thread-backed Task.Run alternative we have now:

await Task.Run(() => Serializer.SerializeWithLengthPrefix(
    stream, data, PrefixStyle.Base128), cancellationToken);

While at it, a part of the body of these methods could be extracted into 
another pair of methods that provide direct conversion to/from byte arrays. 
Kind of like JsonConvert class in Json.NET. A lot of application code is using 
MemoryStream for this, which is easy to get wrong.

BTW, what about Git/Mercurial repo on Github/Bitbucket to allow us to send you 
pull requests easily? Attaching plain patches is so 90s.

Original issue reported on code.google.com by robert.v...@gmail.com on 20 May 2014 at 8:06

GoogleCodeExporter commented 9 years ago
I disagree with your implementation, at least for the serialization. That's not 
asynchronous at all: the serialization itself blocks the current thread - even 
if it's quite fast -, can allocate a potentially very large buffer, and doesn't 
support streaming at all.

That said, it's a good alternative for a slow stream such as NetworkStream in a 
server situation. However, if called from an UI thread, the Task.Run version is 
way better. Since the best way depends on your current application, I wouldn't 
advocate including this method directly inside the protobuf-net library itself.

A real asynchronous implementation would require major changes in the core 
(ProtoReader/ProtoWriter classes) to be really efficient in every scenario.

Regarding Github or equivalent, I totally agree, it would be wonderful to be 
able to send pull requests for protobuf-net.

Original comment by mrj...@gmail.com on 21 May 2014 at 2:29

GoogleCodeExporter commented 9 years ago
I disagree with your disagreement :-) Async APIs should be chunky, not chatty. 
The one-async-call-per-field implementation you imagine would actually result 
in disastrous performance, because there's huge overhead for every awaited 
method call.

The CPU load is negligible. Nobody is sending gigabytes over the network. 
People will mostly use this API to send small packets of data. The disk/network 
latency is what we are optimizing here.

Memory allocations are bounded. The allocated buffer cannot be larger than the 
data structure being serialized. It is therefore an acceptable use of RAM. Not 
to mention that 99.9% of the time, we are talking about small messages of a few 
KB each.

While my code could use some optimizations (MemoryStream has quite large 
initial buffer allocation, for example), the overall chunky approach is optimal.

Original comment by robert.v...@gmail.com on 21 May 2014 at 3:12

GoogleCodeExporter commented 9 years ago
I don't imagine an async-call-per-field: as you said, that would be horrible 
performance-wise. Buffering has to be done somewhere and the general problem is 
quite hard to solve correctly. I suppose that's why there are no real async 
serializers available out there.

The CPU load really depends on several factors. If the model hasn't been 
compiled yet, it can be quite costly. I personally really dislike asynchronous 
APIs that can still block the caller thread. Look at the old implementation of 
Socket.BeginConnect which used to do DNS resolution on the caller side, then 
only connecting asynchronously. If a good asynchronous API can't be written, 
I'd rather only have the synchronous one and let the caller decide what's best 
for him.

Regarding the allocated buffer, even with a perfect one, that's still twice the 
memory. I do agree that in most cases we're only talking about a few kilobytes 
though. The official protobuf page clearly states that protocol buffers aren't 
a good solution for gigabytes of data.

However, let the user of the library decide that and implement buffering if 
he/she needs to. Your methods are perfectly fine in some scenarios, but are a 
no-go in others. If I had to use them today instead of a Task.Run call in a 
project I'm working on, that would get me a nice UI freeze the first time data 
is loaded. On the contrary, using Task.Run in a high-load server scenario would 
just worsen performance by potentially spawning additional threads for nothing.

Now, regarding the implementation itself, you could use a buffer pool 
(protobuf-net already manages one so that could be reused if it's public). 
Besides, avoid calling ToArray() on the MemoryStream: it's better to reset the 
stream position to zero and use CopyToAsync instead, to avoid allocating a 
second buffer.

Original comment by mrj...@gmail.com on 21 May 2014 at 4:06

GoogleCodeExporter commented 9 years ago
It is really, really hard to do proper Async serialization / deserialization; 
the "await" would need to propagate to basically every line of code (including 
all generated code). Messy and inefficient. It is rightly noted that just doing 
regular serialization then Async IO of buffered data gives a quite misleading 
message about the Async nature of the call. I have yet to think of a great way 
of handling this; I don't think this proposal is "it", sadly.

Original comment by marc.gravell on 21 May 2014 at 4:15

GoogleCodeExporter commented 9 years ago
Async is not supposed to guarantee non-blocking operation. Only threads can do 
that. JIT compiler will break any async promise. I don't think UI developers 
care about the time it takes to compile serializers. If they did, they would 
have precompiled and ngen-ed their serializers.

The reason there are no async serialization libraries is that serialization 
itself is CPU-bound while async was designed primarily to eliminate blocking on 
disk and network. I only expect protobuf to perform I/O asynchronously, not 
serialization, and I think most people expect the same. Async is not expected 
to behave like a thread.

Buffering chunks of input/output as you suggest is not an option, because you 
never know which field serializer will hit the end of the chunk. You then have 
to make all field serializers async. I think there are only two possible 
solutions: my proposal and the inefficient field-at-a-time technique.

As for application-specific choice, I think it's about reasonable defaults. 99% 
of applications will be happy with one standard async implementation. The point 
is to have one authoritative implementation that can be collectively improved 
through fixes and optimizations. Duplicating half-baked async code in all 
applications just makes protobuf harder to use and less efficient in practice.

Original comment by robert.v...@gmail.com on 21 May 2014 at 5:10

GoogleCodeExporter commented 9 years ago
For large data, you really don't want to buffer everything unless there's an 
absolute need.

What I am *tempted* to do is to switch the flush code to a one-outstanding 
write-op model; i.e. when it flushes, *instead of* calling Write, it checks 
some iar token, and if that is non-null calls EndWrite; then it calls 
BeginWrite and stashes the new iar (unless it completed synchronously). It 
would still want to start the write via a Task.Run, because it would by 
necessity involve writes on the serialize thread; I don't think you can avoid 
that. But there's an interesting performance improvement to be had there by 
overlapping the CPU and IO - completely unrelated to the async serialize step, 
though.

The main point I'm trying to emphasize though: serialize to a buffer then async 
write is *not* my preferred option here. The CPU work and IO work will be 
directly proportional to each-other: any time the CPU work is trivial, frankly: 
so is the IO.

Original comment by marc.gravell on 21 May 2014 at 10:07

GoogleCodeExporter commented 9 years ago
Network is usually about 1,000-times slower than protobuf serializer. I/O can 
catch up with serializer only in the case of fast SSD disks. Network I/O can be 
non-trivial even for really small messages. TCP buffers are only a few KB big. 
Readers often hang on reads for minutes before setver sends new updates over 
long-lived connection.

The problem with your iar approach is that your serializer has no way to stop 
while I/O is in progress. Unless of course you run the serializer on dedicated 
thread that can enter wait state at any moment. That's essentially identical to 
running existing synchronous methods via Task.Run. You don't even need to 
bother with iar if you have allocated dedicated thread already.

I could have implemented the async methods by simply calling Task.Run, but I 
didn't do that, because Task.Run is very inefficient. Async methods shouldn't 
need to eat into the thread pool, which is a scarce resource that is easily 
depleted. .NET thread pool takes ages to allocate new threads, about one second 
per thread once you exhaust reserved threads.

While you might not like the serialize-then-write approach, I am arguing all 
the time that there is no other option.

Original comment by robert.v...@gmail.com on 22 May 2014 at 2:27

GoogleCodeExporter commented 9 years ago
I'm still not massively "sold" on that argument, but given that this is 
*basically* what `SerializeWithLengthPrefix` already does, it isn't one that is 
going to keep me awake at nights. The impl can be improved significantly, note, 
but I can worry about that. The other "big issue" is framework targeting, but 
then a .NET 4.5 specific build is probably overdue. I do **not** propose to 
create a .NET 4.0 build that uses the Microsoft.Bcl.Async stubs.

Original comment by marc.gravell on 22 May 2014 at 7:13

GoogleCodeExporter commented 9 years ago
Thanks. I would be happy to see these methods in protobuf-net. .NET 4.0 build 
is not necessary, because people using .NET 4.0 are used to workarounds WRT 
await/async. I am aware the implementation can be improved. That's a part of 
the reason I want it in protobuf-net.

Original comment by robert.v...@gmail.com on 22 May 2014 at 10:50