aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.16k stars 833 forks source link

NettyNioAsyncHttpClient Channel Cache Memory Usage #2247

Open langrp opened 3 years ago

langrp commented 3 years ago

Describe the issue

We're using S3AsyncClient to process customer data from S3. As such we create shared instance of NettyNioAsyncHttpClient for s3 clients. The NettyNioAsyncHttpClient holds pool of netty channels for each bucket we process. This can grow significantly with number of customers. Could there be such cache per region?

Steps to Reproduce

Long running app periodically accessing customers S3 buckets. The memory grows with number of customers.

Your Environment

zoewangg commented 3 years ago

Hi @langrp thanks for reaching out. NettyNioAsyncHttpClient actually doesn't have the knowledge of S3 buckets. A channel corresponds to a TCP connection to the S3 service, and the same connection can be re-used by multiple requests, which might be associated with the same bucket or different buckets. By default, NettyNioAsyncHttpClient has 50 maximum connections, configured by NettyNioAsyncHttpClient.Builder#maxConcurrency, so with the default configuration, there are at most 50 channels in the pool.

The memory grows with number of customers.

Do you mean the objects are not getting GC'd and the memory continue to grow over time, eventually leading to out of memory errors? If so, could you do a heap dump? Could you share the sample code of how you create the S3Client and NettyNioAsyncHttpClient?

langrp commented 3 years ago

Hi, thank you for coming back so quickly. Apologies for not being more specific. I refer to the SdkChannelPoolMap populated with keys from here, which is based on request URI. In case of S3 the request host is prefixed by bucket name, therefore each bucket have its own ChannelPool (configured with max-concurrency you mentioned).

The client is created this way

    NettyNioAsyncHttpClient.Builder b = NettyNioAsyncHttpClient.builder();
      b.connectionTimeout(Duration.ofMillis(10000))
        .readTimeout(Duration.ofMillis(30000))
        .writeTimeout(Duration.ofMillis(30000));
        .maxConcurrency(20)
        .maxPendingConnectionAcquires(1000)
        .connectionAcquisitionTimeout(Duration.ofMillis(30000))
        .build();
Though the pool I refer has still over 890 entries as per below table extracted from my heap dump. Class Name Shallow Heap Retained Heap
software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient @ 0x3814503c8 24 779,011,936
- pools software.amazon.awssdk.http.nio.netty.internal.AwaitCloseChannelPoolMap @ 0x38357ddb8 56 779,011,888
- map java.util.concurrent.ConcurrentHashMap @ 0x38357ddf0 64 779,011,568
- table java.util.concurrent.ConcurrentHashMap$Node[2048] @ 0x381f551f0 - 896 entries 8,208 779,011,432
zoewangg commented 3 years ago

Thank you for the clarification! I see what you mean now. Currently, we only clean up channel pools from SdkChannelPoolMap when the client is closed. We'd need to investigate it to see the best way to address the issue. One possible solution is to have a separate thread running periodically to remove the channel pools that do not have open connections. Will discuss with the team.

langrp commented 3 years ago

If I may propose solution. I'd enhance the SdkHttpRequest to be region aware and use such url as a key into the map. That way we'd lower down the size of map, while keeping the performance. It would also allow us to share the same http client across service clients such as sns, s3, and not increasing the size of map. I think most of the service clients are aware of region they connect to. What do you and your team think about that?

zoewangg commented 3 years ago

Absolutely! We welcome suggestions and new ideas! 🙂 Could you elaborate on caching region? I'm not sure how that would lower down the size of the map because we'd still need to have the mapping between the host and connections.

We talked about another solution to just remove a channel pool from the channel pool map automatically when the last connection is closed.

langrp commented 3 years ago

My goal is to have one channel pool per region. So any S3 bucket request within the region would be able to use the connection from the channel pool. This would decrease the size of the map to number of regions our app needs to connect. Therefore instead of channel pool map size depends on number of buckets, we would have size dependent on number of regions.

From my understanding each service client is aware of region to which it communicates. So whenever I create S3AsyncClient for us-east-1 and list objects in bucket my-test it would grab the connection for that region.

zoewangg commented 3 years ago

It seems even with the region, we would still need a map inside SdkChannelPoolMap, something like Map<Region, Map<URI, SimpleChannelPoolAwareChannelPool>>. So the inner map size would still depend on the number of URIs - the number of buckets in your case.

RafalSierkiewicz commented 2 months ago

I think I have similar problem where the retained heap grows

image

and same for map node

image

Do you know about any solution which can be applied to workaround the issue ?