apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.25k stars 3.58k forks source link

[Enhancement] Optimize Pulsar Java client zlib compression performance on Java 11+ by passing direct buffers #23586

Open lhotari opened 3 days ago

lhotari commented 3 days ago

Search before asking

Motivation

Here's an example of CompressionCodeZLib which has several opportunities for optimizations: https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-common/src/main/java/org/apache/pulsar/common/compression/CompressionCodecZLib.java#L60-L85

Solution

The java.util.zip.Deflater class has contained methods for using ByteBuffer input and output since Java 11.

In the case of Java 11+, the code could be optimized. Since the Pulsar Java client is Java 8+, using the ByteBuffer methods would require the use of reflection (unless a multi-release jar file is used with separate classes for Java 8 and Java 11). There's a reflection example in different situation in BookKeeper's Java9IntHash class.

Regarding performance on Java 11+, the first problem is that it's using a heap buffer for the compressed buffer. A direct buffer would be more optimal when using the ByteBuffer methods with Deflater. For Netty ByteBuf input, it's possible to achieve zero copy in most cases by using Netty ByteBuf's nioBuffer method. It's notable that using nioBuffer method will cause copies when the Netty ByteBuf input is a CompositeByteBuf. Netty doesn't have a good way for zero copy of CompositeByteBuf input. In BookKeeper, there's a solution for checksum calculation in the https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java class, which can visit all buffer parts to avoid extra copies. A similar solution would be applicable to compression.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

liangyepianzhou commented 3 days ago

Since the Pulsar Java client is Java 8+, using the ByteBuffer methods would require the use of reflection (unless a multi-release jar file is used with separate classes for Java 8 and Java 11). There's a reflection example in different situation in BookKeeper's Java9IntHash class.

This is indeed an optimization direction, but I am worried whether upgrading the JDK version of the client will cause trouble for users to upgrade.

For Netty ByteBuf input, it's possible to achieve zero copy in most cases by using Netty ByteBuf's nioBuffer method. It's notable that using nioBuffer method will cause copies when the Netty ByteBuf input is a CompositeByteBuf. Netty doesn't have a good way for zero copy of CompositeByteBuf input. In BookKeeper, there's a solution for checksum calculation in the https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java class, which can visit all buffer parts to avoid extra copies. A similar solution would be applicable to compression.

My concern is, does the Pulsar client really use CompositeByteBuf to send messages?