apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.28k stars 3.59k forks source link

[Enhancement] Optimize Pulsar Java client zlib compression performance on Java 11+ by passing direct buffers #23586

Open lhotari opened 2 weeks ago

lhotari commented 2 weeks ago

Search before asking

Motivation

Here's an example of CompressionCodeZLib which has several opportunities for optimizations: https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-common/src/main/java/org/apache/pulsar/common/compression/CompressionCodecZLib.java#L60-L85

Solution

The java.util.zip.Deflater class has contained methods for using ByteBuffer input and output since Java 11.

In the case of Java 11+, the code could be optimized. Since the Pulsar Java client is Java 8+, using the ByteBuffer methods would require the use of reflection (unless a multi-release jar file is used with separate classes for Java 8 and Java 11). There's a reflection example in different situation in BookKeeper's Java9IntHash class.

Regarding performance on Java 11+, the first problem is that it's using a heap buffer for the compressed buffer. A direct buffer would be more optimal when using the ByteBuffer methods with Deflater. For Netty ByteBuf input, it's possible to achieve zero copy in most cases by using Netty ByteBuf's nioBuffer method. It's notable that using nioBuffer method will cause copies when the Netty ByteBuf input is a CompositeByteBuf. Netty doesn't have a good way for zero copy of CompositeByteBuf input. In BookKeeper, there's a solution for checksum calculation in the https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java class, which can visit all buffer parts to avoid extra copies. A similar solution would be applicable to compression.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

liangyepianzhou commented 2 weeks ago

Since the Pulsar Java client is Java 8+, using the ByteBuffer methods would require the use of reflection (unless a multi-release jar file is used with separate classes for Java 8 and Java 11). There's a reflection example in different situation in BookKeeper's Java9IntHash class.

This is indeed an optimization direction, but I am worried whether upgrading the JDK version of the client will cause trouble for users to upgrade.

For Netty ByteBuf input, it's possible to achieve zero copy in most cases by using Netty ByteBuf's nioBuffer method. It's notable that using nioBuffer method will cause copies when the Netty ByteBuf input is a CompositeByteBuf. Netty doesn't have a good way for zero copy of CompositeByteBuf input. In BookKeeper, there's a solution for checksum calculation in the https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java class, which can visit all buffer parts to avoid extra copies. A similar solution would be applicable to compression.

My concern is, does the Pulsar client really use CompositeByteBuf to send messages?