Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
91 stars 19 forks source link

fix(s3): reducing array copying on S3 output stream #549

Closed jeqo closed 1 month ago

jeqo commented 4 months ago

Use only ByteRange slices to pass bytes to S3 client operations and remove array copying.

Doing some benchmarks on the current implementation, the Arrays.copyOfRange dominates the memory allocation:

image

By switching to the ByteBuffer approach, this copying is removed:

image

funky-eyes commented 4 months ago

However, in Java, the close method of ByteArrayInputStream has no effect. The methods of this class can be called after the stream has been closed without generating an IOException. This is because the data of ByteArrayInputStream is stored in memory, unlike file streams or network streams that require actual resource cleanup, so there may not be an out-of-memory problem

jeqo commented 4 months ago

@funky-eyes good catch! I'm adding more changes on how bytes are to the request using byte buffers only instead of array copying. PTAL

jeqo commented 1 month ago

For reference, similar improvements have been implemented on kafka core: https://github.com/apache/kafka/pull/15589