Azure / azure-cosmosdb-java

Java Async SDK for SQL API of Azure Cosmos DB
MIT License
54 stars 61 forks source link

Direct TCP: Add metrics, address back pressure issue, and eliminate some resource leaks #242

Closed David-Noble-at-work closed 5 years ago

David-Noble-at-work commented 5 years ago

In addition to resolving issue #228 and adding back stack traces to DocumentClientException this PR

  1. Addresses three specific resource leaks:

    • Inactive RntbdServiceEndpoint instances are now closed and do not accumulate over time
    • RntbdContext instances no longer hold on to pooled byte buffers
    • RntbdResponse.decode now releases the byte buffers associated with partially decoded responses.
  2. Adds RntbdTransportClient and RntbdServiceEndpoint metrics using micrometer.

    • Metrics are encapsulated by the RntbdMetrics class and an instance of this class is instantiated by each RntbdServiceEndpoint. Console logging can be turned on/or using this Java property: cosmos.monitoring.consoleLogging.step Reporting frequency in seconds. Use a value less than or equal to zero to explicitly disable console logging.
    • Customers can collect metrics using the MetricsRegistry of their choosing using (the new) AsyncDocumentClient.monitor method
    • The benchmark app enables monitoring using Application Insights by way of these Java properties: cosmos.monitoring.azureMonitor.instrumentationKey An Azure Application Insights instrumentation key cosmos.monitoring.azureMonitor.step Reporting frequency in seconds cosmos.monitoring.azureMonitor.disabled A value of true disables monitoring (useful during development/testing)
    • The benchmark app enables monitoring using Graphite by way of these Java properties: cosmos.monitoring.graphite.serviceAddress Graphite pickle (not plaintext) endpoint address (e.g., cosmos-sdk.eastus.cloudapp.azure.com:2004) cosmos.monitoring.graphite.step Reporting frequency in seconds cosmos.monitoring.graphite.disabled A value of true disables monitoring (useful during development/testing)

E2E test results

e2e.log

Read/write latency performance numbers (complete performance test results attached)