apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.74k stars 781 forks source link

HTTP/3 benchmark results #11446

Open bryancall opened 2 weeks ago

bryancall commented 2 weeks ago
**http2load**
finished in 65.00s, 25725.23 req/s, 29.04MB/s
requests: 1543514 total, 1543514 started, 1543514 done, 1543514 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1543477 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.70GB (1827287672) total, 289.53MB (303596678) headers (space savings 35.34%), 1.88GB (2023121920) data
UDP datagram: 2186563 sent, 2028845 received
                     min         max         mean         sd        +/- sd
time for request:     1.73ms       2.21s     29.36ms     19.76ms    78.47%
time for connect:        0us         0us         0us         0us     0.00%
time to 1st byte:        0us         0us         0us         0us     0.00%
req/s           :     165.41      350.58      257.25       29.79    64.00%

**dstat**
You did not select any stats, using -cdngy by default.
----total-usage---- -dsk/total- ---net/lo-- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send: recv  send|  in   out | int   csw
 77   9  13   0   0|   0  4111k| 532B  532B:8215k  112M|   0     0 |3735k  640k
 89   7   3   0   0|   0   456k| 512B  512B:7116k   85M|   0     0 |3437k  610k
 92   6   2   0   0|   0   140k| 511B  511B:5983k   60M|   0     0 |2680k  495k
 94   5   1   0   0|   0  1639B| 533B  533B:5033k   44M|   0     0 |2351k  314k
 95   4   1   0   0|   0  6552k| 511B  511B:4398k   34M|   0     0 |2149k  146k
 96   4   1   0   0|   0    32M| 512B  512B:3940k   29M|   0     0 |2012k  152k
 96   4   0   0   0|   0  1639B| 512B  512B:3651k   25M|   0     0 |1916k  159k
 96   4   0   0   0|   0     0 | 511B  511B:3396k   22M|   0     0 |1840k  167k
 96   3   0   0   0|   0    46k| 511B  511B:3180k   20M|   0     0 |1754k  168k
 97   3   0   0   0|   0  2458B| 512B  512B:3030k   19M|   0     0 |1711k  171k
 97   3   0   0   0|   0  6580k| 512B  512B:2892k   18M|   0     0 |1699k  175k
 97   3   0   0   0|   0    17M| 512B  512B:2729k   17M|   0     0 |1639k  176k
 97   3   0   0   0|   0   784k| 511B  511B:2635k   16M|   0     0 |1608k  170k
**perf stat**
perf: 'stat-p' is not a perf-command. See 'perf --help'.
**perf report**
# Total Lost Samples: 0
#
# Samples: 1M of event 'cycles:P'
# Event count (approx.): 8502898076061
#
#   Overhead  Shared Object         Symbol                                              IPC   [IPC Coverage]
# ..........  ....................  ..................................................  ....................
#
      40.04%  traffic_server        [.] QUICStream::id() const                          -      -
      27.27%  traffic_server        [.] QUICStreamManager::find_stream(unsigned long)   -      -
      16.18%  traffic_server        [.] freelist_new(_InkFreeList*)                     -      -
       2.15%  traffic_server        [.] freelist_free(_InkFreeList*, void*)             -      -
       1.03%  traffic_server        [.] IOBufferBlock::clear()                          -      -
       0.33%  traffic_server        [.] thread_freeup(FreelistAllocator&, ProxyAllocat  -      -
       0.29%  traffic_server        [.] ink_freelist_free(_InkFreeList*, void*)         -      -
       0.29%  traffic_server        [.] (anonymous namespace)::build_iovec_block_chain  -      -
       0.26%  libc.so.6             [.] __memmove_avx_unaligned_erms                    -      -
       0.23%  libc.so.6             [.] _int_malloc                                     -      -
       0.22%  traffic_server        [.] ink_freelist_new(_InkFreeList*)                 -      -
       0.21%  traffic_server        [.] IOBufferData::free()                            -      -
       0.15%  libquiche.so          [.] <alloc::string::String as core::fmt::Write>::w  -      -
       0.14%  [vdso]                [.] __vdso_clock_gettime                            -      -
       0.14%  libquiche.so          [.] core::fmt::write                                -      -
       0.14%  [kernel.kallsyms]     [k] perf_adjust_freq_unthr_context                  -      -
       0.13%  libc.so.6             [.] _int_free                                       -      -
       0.13%  traffic_server        [.] QPACK::_encode_header(MIMEField const&, unsign  -      -
       0.13%  [kernel.kallsyms]     [k] native_sched_clock                              -      -
       0.13%  libc.so.6             [.] malloc_consolidate                              -      -
       0.12%  [kernel.kallsyms]     [k] __memcpy                                        -      -
       0.11%  [kernel.kallsyms]     [k] osq_lock                                        -      -
       0.10%  traffic_server        [.] QPACK::encode(unsigned long, HTTPHdr&, MIOBuff  -      -
bryancall commented 2 weeks ago

These are the results of http/2:

**http2load**
finished in 65.00s, 781381.18 req/s, 783.06MB/s
requests: 46882871 total, 46882871 started, 46882871 done, 46882871 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 46882871 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 45.88GB (49265980482) total, 424.87MB (445505292) headers (space savings 96.31%), 48.46GB (52033899520) data
                     min         max         mean         sd        +/- sd
time for request:      110us     52.30ms      1.13ms       654us    89.32%
time for connect:        0us         0us         0us         0us     0.00%
time to 1st byte:        0us         0us         0us         0us     0.00%
req/s           :    5869.32     8612.41     7813.76      713.22    78.00%

**dstat**
You did not select any stats, using -cdngy by default.
----total-usage---- -dsk/total- ---net/lo-- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send: recv  send|  in   out | int   csw
 77  18   5   0   0|   0    17M|5973B 5973B:  32M  760M|   0     0 |9306k  192k
 82  18   0   0   0|   0  1305k| 545B  545B:  34M  808M|   0     0 |9652k  194k
 82  18   0   0   0|   0   300k| 512B  512B:  33M  804M|   0     0 |  10M  195k
 82  18   0   0   0|   0  1638B| 512B  512B:  33M  804M|   0     0 |  10M  196k
 82  18   0   0   0|   0    68M| 511B  511B:  33M  802M|   0     0 |9808k  197k
 82  18   0   0   0|   0   496M| 512B  512B:  33M  801M|   0     0 |9556k  195k
 82  18   0   0   0|   0  3277B| 512B  512B:  34M  808M|   0     0 |9403k  197k
 82  18   0   0   0|   0  1638B| 512B  512B:  33M  801M|   0     0 |9699k  195k
 82  18   0   0   0|   0  1638B| 512B  512B:  33M  793M|   0     0 |9757k  194k
 82  18   0   0   0|   0  3277B|1194B 1194B:  33M  791M|   0     0 |9641k  195k
 82  18   0   0   0|   0   267k|1340B 1340B:  33M  788M|   0     0 |9722k  193k
 83  17   0   0   0|   0   664M| 512B  512B:  33M  785M|   0     0 |9787k  192k
 83  17   0   0   0|   0   824k| 512B  512B:  33M  791M|   0     0 |9589k  191k
**perf stat**
perf: 'stat-p' is not a perf-command. See 'perf --help'.
**perf report**
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles:P'
# Event count (approx.): 7767083254126
#
#   Overhead  Shared Object         Symbol                                              IPC   [IPC Coverage]
# ..........  ....................  ..................................................  ....................
#
       3.09%  [vdso]                [.] __vdso_clock_gettime                            -      -
       1.56%  traffic_server        [.] mime_hdr_field_find(MIMEHdrImpl*, char const*,  -      -
       1.49%  traffic_server        [.] hdrtoken_tokenize(char const*, int, char const  -      -
       1.39%  traffic_server        [.] mime_parser_parse(MIMEParser*, HdrHeap*, MIMEH  -      -
       1.34%  traffic_server        [.] VersionConverter::convert(HTTPHdr&, int, int)   -      -
       1.16%  traffic_server        [.] freelist_new(_InkFreeList*)                     -      -
       0.98%  libc.so.6             [.] __memmove_avx_unaligned_erms                    -      -
       0.95%  traffic_server        [.] XpackDynamicTable::lookup(char const*, unsigne  -      -
       0.88%  libcrypto.so          [.] _aesni_ctr32_ghash_6x                           -      -
       0.83%  libc.so.6             [.] __memcmp_avx2_movbe                             -      -
       0.72%  traffic_server        [.] HpackIndexingTable::lookup(HpackHeaderField co  -      -
       0.69%  libc.so.6             [.] __memchr_avx2                                   -      -
       0.65%  traffic_server        [.] HdrHeap::duplicate_str(char const*, int)        -      -
       0.64%  libc.so.6             [.] toupper                                         -      -
       0.62%  traffic_server        [.] mime_hdr_field_attach(MIMEHdrImpl*, MIMEField*  -      -
       0.60%  traffic_server        [.] huffman_decode(char*, unsigned char const*, un  -      -
       0.60%  traffic_server        [.] MIMEScanner::get(swoc::_1_5_12::TextView&, swo  -      -
       0.60%  traffic_server        [.] HdrHeap::allocate_str(int)                      -      -
bryancall commented 2 weeks ago

Changes for benchmarking and debugging the issue:

diff --git a/src/iocore/net/QUICPacketHandler.cc b/src/iocore/net/QUICPacketHandler.cc
index d336fcd57..017926cb2 100644
--- a/src/iocore/net/QUICPacketHandler.cc
+++ b/src/iocore/net/QUICPacketHandler.cc
@@ -230,7 +230,7 @@ QUICPacketHandlerIn::_recv_packet(int event, UDPPacket *udp_packet)

   EThread *eth = nullptr;
   if (vc == nullptr) {
-    if (!quiche_version_is_supported(version)) {
+    if (0 && !quiche_version_is_supported(version)) {
       Ptr<IOBufferBlock> udp_payload(new_IOBufferBlock());
       udp_payload->alloc(iobuffer_size_to_index(DEFAULT_MAX_DATAGRAM_SIZE, BUFFER_SIZE_INDEX_2K));
       QUICPHDebug(QUICConnectionId(scid, scid_len), QUICConnectionId(dcid, dcid_len), "Unsupported version: 0x%x", version);
diff --git a/src/iocore/net/quic/QUICStreamManager.cc b/src/iocore/net/quic/QUICStreamManager.cc
index eb241c23a..ff97164fb 100644
--- a/src/iocore/net/quic/QUICStreamManager.cc
+++ b/src/iocore/net/quic/QUICStreamManager.cc
@@ -81,11 +81,19 @@ QUICStreamManager::stream_count() const
 QUICStream *
 QUICStreamManager::find_stream(QUICStreamId stream_id)
 {
+  int32_t count = 0;
   for (QUICStream *s = this->stream_list.head; s; s = s->link.next) {
+    ++count;
     if (s->id() == stream_id) {
+      if (count > 1000) {
+        Debug("bcall", "Found stream %p in stream_list. count: %d", s, count);
+      }
       return s;
     }
   }
+  if (count > 5000) {
+    Debug("bcall", "Stream %ld not found in stream_list. count: %d", stream_id, count);
+  }
   return nullptr;
 }
brbzull0 commented 2 weeks ago

I was just talking with Masakazu, I'll start looking into this now.

brbzull0 commented 1 week ago

it seems aren't deleting any streams at all.

probably because we do not get the right answer from this:

https://github.com/apache/trafficserver/blob/9f135bfbee7341ebc3b53965cc4fd7abb16fab7d/src/iocore/net/quic/QUICStreamVCAdapter.cc#L338-L341

I'll dig dipper.

https://github.com/apache/trafficserver/pull/11196