Kong / unirest-java

Unirest in Java: Simplified, lightweight HTTP client library.
http://kong.github.io/unirest-java/
MIT License
2.59k stars 593 forks source link

`request.asObjectAsync(RawResponse::getContent).get()` causes out of memory errors #372

Closed soc closed 2 years ago

soc commented 3 years ago

Downloading sufficiently large files with

request.asObjectAsync(RawResponse::getContent).get()

fails with an out of memory exception, as it appears that the get call reads the whole response into memory, despite only requesting an InputStream with RawResponse::getContent.

(The reason why the InputStream is desired is because it seems to be the only way to support cancellation of running transfers.)

To Reproduce Use the code above to download a large file.

Expected behavior Receiving an InputStream without Unirest loading the whole request into memory.

Environmental Data:

ryber commented 3 years ago

I think your problem is that the server you're getting the data from isn't streaming the data to you (typically with an Octet Stream. It's probably giving you all the bytes so the input stream you get has all the bytes. I don't think there is really anything I can do about that as it comes from Apache.

In a normal system you get a true streaming inputstream. For example I just downloaded this 1gb file without issue from a unit test and memory stayed stable the entire time:

 Unirest.get("https://speed.hetzner.de/1GB.bin")
                .downloadMonitor((field, fileName, bytesWritten, totalBytes) -> {
                    System.out.println(bytesWritten + " of " + totalBytes + " FreeMem:" + Runtime.getRuntime().freeMemory());
                })
                .asFile("test100.zip", StandardCopyOption.REPLACE_EXISTING);
soc commented 3 years ago

asFile works without issues, this is about asObjectAsync

I can't use asFile as it gives me nothing to implement cancellation.

server you're getting the data from isn't streaming the data to you (typically with an Octet Stream. It's probably giving you all the bytes so the input stream you get has all the bytes

I'm not sure I understand this – I don't think this distinction can even exist from a technical point of view, it's a HTTP connection running over TCP. There is no "giving you all bytes" vs. streaming. Octet stream is a content type, not a transfer mode.

Even if the Content-Length was unknown, this should not result in Unirest loading the whole file into memory.


¹ except when the file already exists, then it runs out of memory, too, but that's another issue

ryber commented 3 years ago

The actual InputStream comes directly from Apache HttpAsyncClient, Unirest doesn't do anything with it other than (optionally) wrapping it in the monitor. I still get streaming responses from it like this with no impact on memory

 Path path = new File("file.zip").toPath();
        Unirest.get("https://speed.hetzner.de/100MB.bin")
                .asObjectAsync(
                        raw -> {
                            try {
                            InputStream stream = new MonitoringInputStream(raw.getContent(),
                                    (field, fileName, bytesWritten, totalBytes) -> {
                                        System.out.println(bytesWritten + " of " + totalBytes + " FreMem:" + Runtime.getRuntime().freeMemory());
                                    },path,
                                    raw);

                                Files.copy(stream, path, StandardCopyOption.REPLACE_EXISTING);
                                return new File("file.zip");
                            } catch (IOException e) {
                                e.printStackTrace();
                                return null;
                            }

                        }
                ).get();
soc commented 3 years ago

I built and ran the old code – here is the stacktrace:

java.util.concurrent.ExecutionException: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?]
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
    at ----> request.asObjectAsync(RawResponse::getContent).get() <----
    ... 7 more
Caused by: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.closed(HttpAsyncRequestExecutor.java:146) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:71) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.disconnected(AbstractIODispatch.java:100) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionClosed(BaseIOReactor.java:277) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processClosedSessions(AbstractIOReactor.java:449) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.hardShutdown(AbstractIOReactor.java:590) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:305) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar:4.4.13]
    ... 1 more
[ERROR] [2020.10.01-15:47:15] [pool-41-thread-1] client.InternalHttpAsyncClient - I/O reactor terminated abnormally
org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:359) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) [httpasyncclient-4.1.4.jar:4.1.4]
    at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61) ~[?:?]
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:348) ~[?:?]
    at org.apache.http.nio.util.HeapByteBufferAllocator.allocate(HeapByteBufferAllocator.java:52) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.nio.util.ExpandableBuffer.expandCapacity(ExpandableBuffer.java:109) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.nio.util.ExpandableBuffer.expand(ExpandableBuffer.java:140) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.nio.util.SimpleInputBuffer.consumeContent(SimpleInputBuffer.java:69) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.nio.protocol.BasicAsyncResponseConsumer.onContentReceived(BasicAsyncResponseConsumer.java:85) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.nio.protocol.AbstractAsyncResponseConsumer.consumeContent(AbstractAsyncResponseConsumer.java:147) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.client.MainClientExec.consumeContent(MainClientExec.java:329) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.consumeContent(DefaultClientExchangeHandlerImpl.java:157) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:336) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar:4.1.4]
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:121) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar:4.4.13]
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar:4.4.13]
    ... 1 more
soc commented 3 years ago

Same stacktrace with

      request
          .downloadMonitor((__1, __2, bytesWritten, bytesTotal) -> {
            this.bytesWritten = bytesWritten;
            this.bytesTotal = bytesTotal;
          })
          .thenConsumeAsync(response -> {
            try {
              Files.copy(response.getContent(), Path.of(path), StandardCopyOption.REPLACE_EXISTING);
            } catch (IOException e) {
              throw new RuntimeException(e);
            }
          });

The files I'm testing with are slightly above 2GB in size.

ryber commented 3 years ago

So... the problem is the ApacheAsyncClient uses BasicAsyncResponseConsumer as it's default consumer, and it does indeed buffer the stream into memory in pretty large amounts. We would need to switch to something like the ZeroCopyConsumer.

Doing this is not as easy as just a swap of so it's going to take some time and testing

ryber commented 3 years ago

So Just FYI, this is going to require quite a bit more than I thought and I don't really see a way to salvage thenConsumeAsync from already having read all the bytes. It's likely a large breaking change, the consuming needs to be represented in an entirely different way with consumers needing to provide something that can take chunked parts and build a response. We can't get just a raw InputStream out of Apache like we can on the regular client.

This also changes significantly under Http Client 5 and It might be best to build this feature into that branch. I'll dig a little more into a simple fix but I'm not seeing it at the moment.

One solution for you're problem might be to use Apache rather than Unirest and use the <T> Future<T> execute( HttpAsyncRequestProducer requestProducer, HttpAsyncResponseConsumer<T> responseConsumer, FutureCallback<T> callback); method with a ZeroCopyConsumer for the file

ryber commented 3 years ago

Confirmed that due to the way the reactors work in the apache async code we will need to change the way async response consumption is done, this is going to impact a number of things and render some of the existing public methods unworkable in their current form. It also is different between HttpClient4 and 5 and since that work is currently being done I'm going to tag this for that change

soc commented 3 years ago

Thanks for the heads up, @ryber!

ryber commented 2 years ago

Unirest-4 which uses the native java client doesn't seem to have this problem. the inputstream is a HttpResponseInputStream and seems to be properly streaming the response.

This does mean upgrading off Java 8. https://github.com/Kong/unirest-java/discussions/429