Open franz1981 opened 1 year ago
@He-Pin I remember you have commented on the gRpc benchmark...we are still working on enabling our load generator to support gRpc, hence we have no "official" and supported benchmarks to verify this. This is mostly based on my knowledge of the Netty stack and the OpenJDK platform; do you have anything to test it and verify if it is worthy?
This change is optimizing the use case of Netty composite buffers in case a small payload/data is required.
The NIO Netty flow will end up copying the compontents from the CompositeBuffer in a single native direct buffer at https://github.com/netty/netty/blob/fbb0207d5ecce39f3d63450dfd59bad5510b8e8b/transport/src/main/java/io/netty/channel/nio/AbstractNioChannel.java#L443.
For a composite buffer made of 2 components (the length and compression flag + data ones) it means iterating them (updating for each the offsets etc etc) in
CompositeByteBuf::getBytes(int index, ByteBuf dst, int dstIndex, int length)
(calling back toUnpooledHeapByteBuf::getBytes(int index, ByteBuf dst, int dstIndex, int length)
with the directdst
for each component, updating each ones offset, checking accessibility etc etc).In the case of a single merged buffer, we pay an additional allocation (and copy of data), but:
+176 bytes
, while allocating a new buffer (not composite) would costdata length + 5 + 48 bytes
and perform the additional data copy. eg for 128 bytes data, the composite would still allocate +176 bytes, while this PR will allocate 128 + 53 = 181 bytes which is similar (but we can throw away immediately the original data buffer, no longer needed!)setBytes(AbstractByteBuf buf, long addr, int index, ByteBuf src, int srcIndex, int length)
on the direct pooled buffer, using a single vertx heap buffer which contains length + data.The point about reachability of buffers is a stealthy but important consideration: