apache / bookkeeper

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
https://bookkeeper.apache.org/
Apache License 2.0
1.89k stars 900 forks source link

Provide a zero-copy write protocol #347

Open eolivelli opened 7 years ago

eolivelli commented 7 years ago

FEATURE REQUEST

  1. Please describe the feature you are requesting. In the V3 protocol we use Google Protobuf, which is great for portability but it forces to perform copies of the payloads of the messages. The V2 protocol is more direct and requires no copy, but it is all hardcoded in Java/Bookeeper code and it is difficult to maitain.

We would like to have a good IDL-supported tool for writing the wire protocol with the ability to leverage power of Netty 4 ByteBuf or at least NIO ByteBuffers

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

should-have

eolivelli commented 7 years ago

We are looking for something like this https://dwrensha.github.io/capnproto-java/index.html

/cc @merlimat @revans2

sijie commented 7 years ago

@eolivelli can you remove 'replace google protobuf' from caption? this is going to be misleading. I don't see replacing google protobuf is going to happen any time soon, because the fact the protocol has been run on thousands of machines. changing that requires a long discussion and it will be a long journey to get there.

And protobuf3 at least started offering unsafe operations to wrap bytebuffer to bytestring. we also need to evaluate this method as well.

sijie commented 7 years ago

this is going to a big change. can we first unmark the milestone? before we come up with a method, let's not mark it in any release.

merlimat commented 7 years ago

@eolivelli My preference would be to keep the current Protobuf wire protocol and use a zero-copy implementation to serialize/deserialize.

As I mentioned some time back in one of the calls, I have a half-backed protobuf zero-copy library that I've been playing around. I should find some time to get back to it :).

If anyone is interested I can share the code as well.

eolivelli commented 7 years ago

@merlimat it would be great to be compatibile on the wire protocol

as @sijie told it seems that ProtoBuf3 supports usage of ByteBuffers see https://github.com/google/protobuf/releases

I am interested in this topic, If you have time to share some code I will appreciate that thanks

eolivelli commented 7 years ago

@sijie you already started pushing changes in this direction (see the upgrade to Protobuf 3.4), do you want to assign this issue to yourself ?

sijie commented 7 years ago

@eolivelli I will leave it here unassigned. due to some limitations in protobuf 3, we can improve the situation (e.g. avoid memory copy for ByteString), however there are multiple places that we still copy memory. It is a bit hard to address at this moment.

eolivelli commented 7 years ago

@sijie got it, clear.thanks

ivankelly commented 6 years ago

I think the first change for this would be to move the payload out of the protobuf itself. This would remove the bulk of the need to copy.

Another aspect of this would be removing the GC-churn caused by protobuf itself, but I think that can be handled separately. For that we could take the protobuf stuff done for pulsar as @merlimat said. Even if it's half baked, it works very well.

sijie commented 6 years ago

@ivankelly

I think the first change for this would be to move the payload out of the protobuf itself.

proto3 supports UnsafeByteOperations for ByteString to use an existing bytebuffer without memory copying. we don't need to move payload out of the protobuf, we just need to leverage this feature.

ivankelly commented 6 years ago

Nice, I didn't know that existed. Would be nice to have the recycler stuff in 3 so we're not allocating a ByteString object each time also.

dlg99 commented 6 years ago

UnsafeByteOperations helps with memory allocations (I prototyped this at Salesforce's internal fork and it did help). It does not help with zero copy. Netty added splice support https://github.com/netty/netty/pull/3665 and Java's NIO has transferTo/transferFrom that allow to bypass data copying to user space. As I remember, Kafka uses these. IIRC zero copy is impossible with protobuf as it needs to bring data to user space for parsing. https://github.com/google/protobuf/issues/1896

Cap'n'proto or flatbuffers might work for this: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html

sijie commented 6 years ago

UnsafeByteOperations helps with memory allocations (I prototyped this at Salesforce's internal fork and it did help).

@dlg99 do you mind pushing your prototype out?

sijie commented 6 years ago

fyi google/protobuf#3296

dlg99 commented 6 years ago

@sijie it is in internal repo and done on old fork. cannot really merge to community code / have to redo it from scratch.

sijie commented 6 years ago

@dlg99 okay. gotcha.

jvrao commented 6 years ago

Great discussion, but I am not sure where we are headed.

  1. What kind of improvements is possible with ProtoBufs? a. Move to V3 and use unsafe operations. As per @dlg99 this saves memory allocations but not buffer copy. b. As @dlg99 suggested we can implement splice, does anyone have a good grasp of what it can save? Do we approach near to zero-copy? c. Should we consider separating the payload as @ivankelly suggested?

  2. With the combination of a/b/c can we live with Probuff v3 or should we seriously consider FlatBuffs?

  3. Even with FlatBuffs it is not a slamdunk and it's going to be a huge change. Is it really worth?

We really need to improve this and I would love everyone to switch to V3 protocol. It's a small community, and having two protocols, neither will get enough testing/attention. This is clearly evident with the latest refcount bug(s) we ran into.

ivankelly commented 6 years ago

@jvrao Just a note on V2 vs V3, it seems we're discussing multiple disjoint things.

When I mentioned proto3 above, I meant protoc 3.0, i.e. version 3 of the compiler which can generate code from protobuf2 and protobuf3 format. I would expect that we would stay on protobuf2 format, for the core protocol, as protobuf3 has a lot of changes which we don't necessarily need. protobuf2 isn't deprecated in any way, and it will coexist with protobuf3 for the foreseeable future. They just target different usecases.

Now, we are already on protoc 3.0 for bookkeeper, but pulsar, uses a modified version of protoc 2.6, which reduces the number of allocations. I would like to upgrade this modified version of protoc to a 3.x version, so that we can use it in bookkeeper (we need 3.0 for grpc stuff).

This is separate from what we call V2 and V3 protocols within bookkeeper. It would be good to kill V2, but we cannot do that until we do a lot of other stuff to bring V3 up to scratch.

Ghatage commented 3 years ago

@karanmehta93 was asking about protobuf copies in this slack thread and in his PR https://github.com/apache/bookkeeper/pull/2441. Looking forward to his update on this.