alibaba / tengine

A distribution of Nginx with some advanced features
https://tengine.taobao.org
BSD 2-Clause "Simplified" License
12.76k stars 2.52k forks source link

tengine+xquic+tongsuo+QAT性能无提升 #1914

Open foxriver1025 opened 8 months ago

foxriver1025 commented 8 months ago

Ⅰ. Issue Description

使用tengine+xquic+tongsuo+QAT测试QUIC的性能,对比不用QAT硬件加速时性能无提升

Ⅱ. Describe what happened

使用tengine+xquic+tongsuo搭建HTTP3的server环境,同时使用QAT硬件对加解密算法进行加速,但是测试效果发现使用QAT硬件无法提升HTTPS的性能

Ⅲ. Describe what you expected to happen

intel使用NGINX-QUIC(async-mode)+boringSSL+QAT,性能提升了4.6倍。 参考file:///home/zhoubin/Downloads/IETF%20QUIC%20Acceleration%20using%20Intel%C2%AE%20QuickAssist%20Technology%20-Intel%20QAT%20(1).pdf 期望使用tengine+xquic+tongsuo+QAT也能有4倍左右的性能提升

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

大概分析了握手过程中的调用栈,原因应该是tengine+xquic不支持async_mode导致的。 Thread 1 "tengine" hit Breakpoint 1, qat_engine_ecdh_compute_key (out=out@entry=0x7fff4fcf0b18, outlen=outlen@entry=0x7fff4fcf0b20, pub_key=0x55a8bd93d870, ecdh=0x55a8bd92ff30) at qat_hw_ec.c:795 795 { (gdb) bt

0 qat_engine_ecdh_compute_key (out=out@entry=0x7fff4fcf0b18, outlen=outlen@entry=0x7fff4fcf0b20, pub_key=0x55a8bd93d870, ecdh=0x55a8bd92ff30) at qat_hw_ec.c:795

1 0x00007fcdb725a43f in QAT_ECDH_compute_key (out=out@entry=0x55a8bd92ba10, outlen=, outlen@entry=32, pub_key=, eckey=eckey@entry=0x55a8bd92ff30, KDF=KDF@entry=0x0)

at qat_prov_ecdh.c:582

2 0x00007fcdb725a964 in qat_keyexch_ecdh_plain_derive (outlen=, psecretlen=0x7fff4fcf0cb0, secret=0x55a8bd92ba10 "\233\020\031\347\255U", vpecdhctx=0x55a8bd95af20)

at qat_prov_ecdh.c:666

3 qat_keyexch_ecdh_derive (vpecdhctx=0x55a8bd95af20, secret=0x55a8bd92ba10 "\233\020\031\347\255U", psecretlen=0x7fff4fcf0cb0, outlen=) at qat_prov_ecdh.c:763

4 0x00007fcdb79aa2e2 in ssl_derive (s=s@entry=0x55a8bd964d60, privkey=, pubkey=pubkey@entry=0x55a8bd8f6d60, gensecret=gensecret@entry=1) at ssl/s3_lib.c:4125

5 0x00007fcdb79eae7d in tls_construct_stoc_key_share (s=0x55a8bd964d60, pkt=0x7fff4fcf0e30, context=, x=, chainidx=)

at ssl/statem/extensions_srvr.c:1779

6 0x00007fcdb79e0eac in tls_construct_extensions (s=s@entry=0x55a8bd964d60, pkt=pkt@entry=0x7fff4fcf0e30, context=512, x=x@entry=0x0, chainidx=chainidx@entry=0) at ssl/statem/extensions.c:881

7 0x00007fcdb79facb2 in tls_construct_server_hello (s=0x55a8bd964d60, pkt=0x7fff4fcf0e30) at ssl/statem/statem_srvr.c:2560

8 0x00007fcdb79ec49d in write_state_machine (s=0x55a8bd964d60) at ssl/statem/statem.c:890

9 state_machine (s=0x55a8bd964d60, server=1) at ssl/statem/statem.c:482

10 0x00007fcdb79b8a08 in SSL_do_handshake (s=0x55a8bd964d60) at ssl/ssl_lib.c:4075

11 0x00007fcdb7b955e1 in xqc_ssl_do_handshake (ssl=0x55a8bd964d60) at /home/zhoubin/code/quic/xquic/src/tls/babassl/xqc_ssl_if.c:119

12 0x00007fcdb7b921af in xqc_tls_do_handshake (tls=0x55a8bd913480) at /home/zhoubin/code/quic/xquic/src/tls/xqc_tls.c:394

13 0x00007fcdb7b928da in xqc_tls_process_crypto_data (tls=0x55a8bd913480, level=XQC_ENC_LEV_INIT, crypto_data=0x55a8bd8e6f60 "\001", data_len=559)

at /home/zhoubin/code/quic/xquic/src/tls/xqc_tls.c:568

14 0x00007fcdb7b630d0 in xqc_read_crypto_stream (stream=0x55a8bd95c2c0) at /home/zhoubin/code/quic/xquic/src/transport/xqc_stream.c:729

15 0x00007fcdb7b76cd2 in xqc_process_crypto_frame (conn=0x55a8bd913680, packet_in=0x7fff4fcf1190) at /home/zhoubin/code/quic/xquic/src/transport/xqc_frame.c:559

16 0x00007fcdb7b756b2 in xqc_process_frames (conn=0x55a8bd913680, packet_in=0x7fff4fcf1190) at

/home/zhoubin/code/quic/xquic/src/transport/xqc_frame.c:224

17 0x00007fcdb7b7497c in xqc_packet_decrypt_single (c=0x55a8bd913680, packet_in=0x7fff4fcf1190) at /home/zhoubin/code/quic/xquic/src/transport/xqc_packet.c:183

18 0x00007fcdb7b74b35 in xqc_packet_process_single (c=0x55a8bd913680, packet_in=0x7fff4fcf1190) at /home/zhoubin/code/quic/xquic/src/transport/xqc_packet.c:226

19 0x00007fcdb7b4efbf in xqc_conn_process_packet (c=0x55a8bd913680, packet_in_buf=0x55a8bcecdbc8 <packet+232> <incomplete sequence \303>, packet_in_size=1216, recv_time=1706106787763823)

at /home/zhoubin/code/quic/xquic/src/transport/xqc_conn.c:3528

20 0x00007fcdb7b4112f in xqc_engine_packet_process (engine=0x55a8bd8b5590, packet_in_buf=0x55a8bcecdbc8 <packet+232> <incomplete sequence \303>, packet_in_size=1216,

local_addr=0x55a8bcecdb54 <packet+116>, local_addrlen=16, peer_addr=0x55a8bcecdae0 <packet>, peer_addrlen=16, recv_time=1706106787763823, user_data=0x7fcdb48aa010)
at /home/zhoubin/code/quic/xquic/src/transport/xqc_engine.c:1227

21 0x000055a8bce4995e in ngx_xquic_dispatcher_process_packet (c=c@entry=0x7fcdb48aa010, packet=packet@entry=0x55a8bcecdae0 ) at modules/ngx_http_xquic_module/ngx_xquic_recv.c:403

22 0x000055a8bce49ad5 in ngx_xquic_event_recv (ev=0x55a8bd885570) at modules/ngx_http_xquic_module/ngx_xquic_recv.c:312

23 0x000055a8bcdceca3 in ngx_epoll_process_events (cycle=, timer=, flags=) at src/event/modules/ngx_epoll_module.c:972

24 0x000055a8bcdc2c1e in ngx_process_events_and_timers (cycle=cycle@entry=0x55a8bd85eb90) at src/event/ngx_event.c:284

25 0x000055a8bcdcdf44 in ngx_single_process_cycle (cycle=cycle@entry=0x55a8bd85eb90) at src/os/unix/ngx_process_cycle.c:336

26 0x000055a8bcda0dd1 in main (argc=, argv=) at src/core/nginx.c:416

以上是抓取的调用栈,结合代码分析,所有函数的调用过程都是同步的,也就是说需要等到QAT完成加解密之后SSL_do_handshake才会返回,由于tengine只有一个线程,在硬件加解密期间无法处理其他的连接请求,从而导致性能无法提升。 使用tengine+tongsuo+qat测试HTTP2时性能是可以大幅提升的,因为tengine支持ssl_async on的配置,调用SSL_do_handshake之前将ssl的mode设置成异步,这样就完全将openssl的async_job(协程)用起来了,SSL_do_handshake在硬件没有完成加解密时就会返回,这样tengine就能立刻处理下一个连接请求,就能大幅提升性能了。

非常非常希望tengine+xquic可以支持tongsuo的异步ssl,这样使用如QAT加速卡就能大幅提升性能了

Ⅵ. Environment: