aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.19k stars 845 forks source link

Support for zero-copy file upload/download with `java.nio` #3928

Open StephenFlavin opened 1 year ago

StephenFlavin commented 1 year ago

Describe the feature

Currently it seems that the code specifically deals in on heap ByteBuffers either converting direct to non-direct Buffers or writing to non-direct Buffers when there are more performant and memory efficient zero-copy routes.

Most of the heavy lifting on supporting these off heap/zero copy file transfers are done by the HTTP client and the OS with native support in Java via FileChannel#transferTo and FileChannel#transferFrom to socket channels and similar for SocketChannel#write for DirectByteBuffers.

The specific implementation of this is up to the HTTP client in use (e.g. https://netty.io/4.0/api/io/netty/channel/FileRegion.html) but the option is removed at too high a level for the client specific implementations.

I believe it was done this way to guard against concurrent modification during write ops which I've discussed on https://github.com/aws/aws-sdk-java-v2/pull/3925

Use Case

High frequency efficient read/write ops to S3

Proposed Solution

Don't do any conversion from direct to non-direct ByteBuffers, pass along the File/FileChannel object instead of converting to a ByteBuffer publisher and leave it up to the http client lib to deal with as required.

For Path/File/Filechannel AsyncRequestBody is unsuitable since it's an implementation of SdkPublisher<ByteBuffer>, I think in this case the AsyncRequestBody is just unnecessary and the plain Path/File/Filechannel should be passed on to the http client.

The Netty docs state

If your operating system (or JDK / JRE) does not support zero-copy file transfer, sending a file with FileRegion might fail or yield worse performance. For example, sending a large file doesn't work well in Windows.

In which case this should be possible to disable with a flag passed to the http client when building the S3Client

Other Information

No response

Acknowledgements

AWS Java SDK version used

macOs 13.2.1 (22D68)

JDK version used

openjdk 11.0.16.1 & openjdk 19.0.1

Operating System and version

macOs 13.2.1 (22D68)

debora-ito commented 1 year ago

Thank you @StephenFlavin, this makes sense. We'll review your related PR.

StephenFlavin commented 1 year ago

The related PR would be a precursor to supporting these for put/post with off heap ByteBuffers. As far as I can tell

if (!buffer.hasArray()) {
    buffer = ByteBuffer.wrap(BinaryUtils.copyBytesFrom(buffer));
}

could be removed and it would just work (tested with the netty client) though I didn't want to do it as part of this PR and I haven't had a chance to delve too deep into the netty implementation to make sure it's doing zero copy, I did see a significant performance bump vs inputstream.

The other piece there will be making use of Filechannel.transferTo for put/post but the changes required for that would be more significant since everything currently deals in ByteBuffers.

I've not yet looked at the changes required to the get requests.

StephenFlavin commented 1 year ago

with the pre-cursor PR merged, I've raised the first PR to enable zero-copy transfers via direct byte buffers with AsyncRequestBody comments/thoughts appreciated. I've kept this change small as a minimal effort to support zero-copy in the async S3 client.

sgammon commented 1 year ago

@StephenFlavin / @debora-ito arrived here looking for this feature in the sdk; if you need a user to build and test let me know (i may anyway)

StephenFlavin commented 10 months ago

Hi @sgammon, I've not had any time to work on this as I'm no longer doing such performance sensitive work with S3 though I would like to see the work continued, I may loop back to this in future but feel free to take on a chunk if you fancy it.