Azure / azure-cosmosdb-java

Java Async SDK for SQL API of Azure Cosmos DB
MIT License
54 stars 61 forks source link

Address memory leak in Direct TCP transport client #326

Closed David-Noble-at-work closed 4 years ago

David-Noble-at-work commented 4 years ago

Repro

mkdir azure-cosmos.results
git checkout master
profile=direct
mvn -P$profile -DargLine="-Dio.netty.leakDetectionLevel=paranoid -Dio.netty.leakDetection.targetRecords=255" verify | tee azure-cosmos.results/Verify-P$profile.output

Observe

A number of leaks detections that originate here:

Fix

Simplify reference counting implementation by changing the granularity of the instances of RntbdResponse, RntbdToken, and RntbdTokenStream. Specifically,

Test results

You will see in the before/after test results that this fix eliminates a memory leak in the Direct TCP stack. Performance numbers are provided for comparison with v4 master. There's no indication that performance regressed. There's some evidence in the comparison between v4 fix and v4 master that performance has improved some.

mvn clean install mkdir -p azure-cosmos.results

for profile in direct e2e emulator examples fast long non-emulator; do timeout 55m mvn -P$profile -DargLine="-Dio.netty.leakDetectionLevel=paranoid -Dio.netty.leakDetection.targetRecords=255" verify | tee azure-cosmos.results/Verify-P$profile.output done



- [Performance numbers](https://github.com/Azure/azure-cosmosdb-java/files/4364491/Performance-numbers.xlsx)

  **Client**
  - East US
  - Standard F16s_v2 (16 vcpus, 32 GiB memory)
  - Linux appserver-lin-3 5.0.0-1032-azure 34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  **Cosmos**
  - East US
  - Benchmark database autoscale 500,000 RUs
  - https://cosmos-sdk-core-3.documents.azure.com:443/

<img width="1361" alt="image" src="https://user-images.githubusercontent.com/3037615/77267076-ab9adc80-6c5e-11ea-863d-5fbd643376d3.png">

<img width="1336" alt="image" src="https://user-images.githubusercontent.com/3037615/77267140-e3098900-6c5e-11ea-948f-87d7a6a71dd0.png">
henrikp-visma commented 4 years ago

@kushagraThapar @moderakh Is there going to be a release for this fix?

We are having very frequent memory issues with the current latest release, 2.6.6, which presumably does not have this fix yet.

kushagraThapar commented 4 years ago

@kushagraThapar @moderakh Is there going to be a release for this fix?

We are having very frequent memory issues with the current latest release, 2.6.6, which presumably does not have this fix yet.

Sure, we can release this fix as v2.6.7, would that work for you guys ?

henrikp-visma commented 4 years ago

Yes, that would be perfect. Thank you!

kushagraThapar commented 4 years ago

@henrikp-visma Release has been done - v2.6.7

Karthickramk commented 4 years ago

We see high latency with 2.6.6 sdk. Upgrading to this version will help to reduce the latency?

Thanks, Karthick

kushagraThapar commented 4 years ago

@Karthickramk It should.