Azure / azure-cosmosdb-java

Java Async SDK for SQL API of Azure Cosmos DB
MIT License
54 stars 61 forks source link

Direct TCP: Use transport request id instead of activity id to correlate requests with responses #137

Closed David-Noble-at-work closed 5 years ago

David-Noble-at-work commented 5 years ago

We now use the transport request id header to track pending requests. See issue #130 which this PR resolves.

Also revised:

Performance Benchmarks

Read latency is improved. Write latency is about the same. Anecdotal evidence that batch write times are improved due to revised channel pool management strategy.

Read Latency

Direct TCP

2019-05-26 18:40:29,690       [main] INFO  com.microsoft.azure.cosmosdb.benchmark.AsyncReadBenchmark - [1000000] operations performed in [159] seconds.
5/26/19 6:40:29 PM =============================================================

-- Meters ----------------------------------------------------------------------
#Successful Operations
             count = 1000000
         mean rate = 6262.27 events/second
     1-minute rate = 6198.53 events/second
     5-minute rate = 5572.52 events/second
    15-minute rate = 5285.01 events/second
#Unsuccessful Operations
             count = 0
         mean rate = 0.00 events/second
     1-minute rate = 0.00 events/second
     5-minute rate = 0.00 events/second
    15-minute rate = 0.00 events/second

-- Timers ----------------------------------------------------------------------
Latency
             count = 1000000
         mean rate = 6262.31 calls/second
     1-minute rate = 6198.69 calls/second
     5-minute rate = 5573.85 calls/second
    15-minute rate = 5286.87 calls/second
               min = 1.26 milliseconds
               max = 4.53 milliseconds
              mean = 1.54 milliseconds
            stddev = 0.15 milliseconds
            median = 1.52 milliseconds
              75% <= 1.60 milliseconds
              95% <= 1.78 milliseconds
              98% <= 1.85 milliseconds
              99% <= 1.91 milliseconds
            99.9% <= 2.55 milliseconds

Direct HTTPS

2019-05-26 18:35:49,081       [main] INFO  com.microsoft.azure.cosmosdb.benchmark.AsyncReadBenchmark - [1000000] operations performed in [175] seconds.
5/26/19 6:35:49 PM =============================================================

-- Meters ----------------------------------------------------------------------
#Successful Operations
             count = 1000000
         mean rate = 5684.83 events/second
     1-minute rate = 5655.35 events/second
     5-minute rate = 5052.53 events/second
    15-minute rate = 4746.05 events/second
#Unsuccessful Operations
             count = 0
         mean rate = 0.00 events/second
     1-minute rate = 0.00 events/second
     5-minute rate = 0.00 events/second
    15-minute rate = 0.00 events/second

-- Timers ----------------------------------------------------------------------
Latency
             count = 1000000
         mean rate = 5684.88 calls/second
     1-minute rate = 5655.57 calls/second
     5-minute rate = 5054.23 calls/second
    15-minute rate = 4748.54 calls/second
               min = 1.40 milliseconds
               max = 9.47 milliseconds
              mean = 1.71 milliseconds
            stddev = 0.39 milliseconds
            median = 1.66 milliseconds
              75% <= 1.75 milliseconds
              95% <= 1.95 milliseconds
              98% <= 2.09 milliseconds
              99% <= 2.75 milliseconds
            99.9% <= 9.47 milliseconds

Write Latency benchmarks

Direct TCP

2019-05-26 17:23:18,382       [main] INFO  com.microsoft.azure.cosmosdb.benchmark.AsyncWriteBenchmark - [1000000] operations performed in [533] seconds.
5/26/19 5:23:18 PM =============================================================

-- Meters ----------------------------------------------------------------------
#Successful Operations
             count = 1000000
         mean rate = 1872.67 events/second
     1-minute rate = 1870.55 events/second
     5-minute rate = 1825.98 events/second
    15-minute rate = 1721.27 events/second
#Unsuccessful Operations
             count = 0
         mean rate = 0.00 events/second
     1-minute rate = 0.00 events/second
     5-minute rate = 0.00 events/second
    15-minute rate = 0.00 events/second

-- Timers ----------------------------------------------------------------------
Latency
             count = 1000000
         mean rate = 1872.68 calls/second
     1-minute rate = 1870.56 calls/second
     5-minute rate = 1826.09 calls/second
    15-minute rate = 1721.61 calls/second
               min = 3.93 milliseconds
               max = 16.32 milliseconds
              mean = 5.33 milliseconds
            stddev = 1.11 milliseconds
            median = 5.11 milliseconds
              75% <= 5.50 milliseconds
              95% <= 7.57 milliseconds
              98% <= 8.55 milliseconds
              99% <= 9.29 milliseconds
            99.9% <= 16.15 milliseconds

Direct HTTPS

2019-05-26 17:46:35,901       [main] INFO  com.microsoft.azure.cosmosdb.benchmark.AsyncWriteBenchmark - [1000000] operations performed in [529] seconds.
5/26/19 5:46:35 PM =============================================================

-- Meters ----------------------------------------------------------------------
#Successful Operations
             count = 1000000
         mean rate = 1890.27 events/second
     1-minute rate = 1895.80 events/second
     5-minute rate = 1837.31 events/second
    15-minute rate = 1716.07 events/second
#Unsuccessful Operations
             count = 0
         mean rate = 0.00 events/second
     1-minute rate = 0.00 events/second
     5-minute rate = 0.00 events/second
    15-minute rate = 0.00 events/second

-- Timers ----------------------------------------------------------------------
Latency
             count = 1000000
         mean rate = 1890.27 calls/second
     1-minute rate = 1895.77 calls/second
     5-minute rate = 1837.38 calls/second
    15-minute rate = 1716.41 calls/second
               min = 4.07 milliseconds
               max = 11.02 milliseconds
              mean = 5.24 milliseconds
            stddev = 0.70 milliseconds
            median = 5.14 milliseconds
              75% <= 5.49 milliseconds
              95% <= 6.10 milliseconds
              98% <= 7.22 milliseconds
              99% <= 8.48 milliseconds
            99.9% <= 11.02 milliseconds

This change is Reviewable

moderakh commented 5 years ago

I am not sure if the 99 percentile printed by metrics/dropwizard to console is for the whole duration of the run or the last period (1s or 10s). @David-Noble-at-work to get the accurate numbers for the TCP could you check please?