alibaba / tengine

A distribution of Nginx with some advanced features
https://tengine.taobao.org
BSD 2-Clause "Simplified" License
12.75k stars 2.52k forks source link

dtls集成,nginx发生崩溃 #1951

Open fitch2015 opened 1 month ago

fitch2015 commented 1 month ago

1、tengine3.1.0版本(历史版本尝试,同结果)

2、编译: ./configure \ --with-stream --with-stream_ssl_module \ --with-http_ssl_module \ --with-stream_sni

3、配置: stream { server { listen 1443 reuseport ssl udp; ssl_protocols dtlsv1.2; ssl_certificate .com.pem; ssl_certificate_key .com.key; return "ok"; } }

4、崩溃日志 2024/08/04 00:42:03 [alert] 1099847#0: worker process 1099848 exited on signal 11 (core dumped) 2024/08/04 00:42:03 [error] 1099909#0: *2 unexcepted message of dtls session while SSL handshaking, udp client: 127.0.0.1, server: 0.0.0.0:1443

[137994.251479] nginx[1099848]: segfault at 7fff6769fff8 ip 00007f40f0dc0a88 sp 00007fff676a0000 error 6 in libcrypto.so.3.0.7[7f40f0cad000+25c000] likely on CPU 5 (core 2, socket 1) [137994.251491] Code: e7 ac 17 00 e9 b9 46 ff ff 66 0f 1f 84 00 00 00 00 00 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 48 83 ec 08 03 d1 ee ff 85 c0 0f 94 c0 48 83 c4 08 0f b6 c0 c3 66 0f 1f 44

lianglli commented 1 month ago

使用debug编译选项,重现后,输出core文件的stack trace。

fitch2015 commented 1 month ago

补充(尝试不同版本openssl都失败,系统centos7、rocklinux9都试过。每次崩溃点不一样,以下截取部分供参考): For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/nginx/sbin/nginx...

warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing [New LWP 228977] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `nginx: worker'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00007f2d2b56a8b6 in ossl_err_get_state_int () from /lib64/libcrypto.so.3

Missing separate debuginfos, use: dnf debuginfo-install libcap-2.48-9.el9_2.x86_64 libffi-3.4.2-8.el9.x86_64 libselinux-3.6-1.el9.x86_64 libxcrypt-4.4.18-3.el9.x86_64 openssl-libs-3.0.7-27.el9.0.2.x86_64 p11-kit-0.25.3-2.el9.x86_64 sssd-client-2.9.4-6.el9_4.1.x86_64 systemd-libs-252-32.el9_4.6.x86_64 zlib-1.2.11-40.el9.x86_64 (gdb) (gdb) bt

0 0x00007f2d2b56a8b6 in ossl_err_get_state_int () from /lib64/libcrypto.so.3

1 0x00007f2d2b56abd4 in get_error_values () from /lib64/libcrypto.so.3

2 0x00000000004450b5 in ngx_ssl_clear_error (log=0x203ddd0) at src/event/ngx_event_openssl.c:4216

3 0x00000000004455d2 in ngx_ssl_recv (c=0x204cd08, buf=0x20bbfe3 "\026\376", <incomplete sequence \375>, size=16717)

at src/event/ngx_event_openssl.c:2669

4 0x0000000000442ba2 in ngx_dtls_read (b=0x20456f0, out=0x20bbfe3 "\026\376", <incomplete sequence \375>, outl=16717)

at src/event/ngx_event_openssl.c:6891

5 0x00007f2d2b4cd44b in bread_conv () from /lib64/libcrypto.so.3

6 0x00007f2d2b4d0445 in bio_read_intern () from /lib64/libcrypto.so.3

7 0x00007f2d2b4d05c7 in BIO_read () from /lib64/libcrypto.so.3

8 0x00007f2d2ba5ae9c in ssl3_read_n.part () from /lib64/libssl.so.3

9 0x00007f2d2ba5b872 in dtls1_read_bytes () from /lib64/libssl.so.3

10 0x00007f2d2ba367fc in ssl3_read_internal () from /lib64/libssl.so.3

11 0x00007f2d2ba3d137 in SSL_read () from /lib64/libssl.so.3

12 0x00000000004455ec in ngx_ssl_recv (c=0x204cd08, buf=0x20bbfe3 "\026\376", <incomplete sequence \375>, size=16717)

at src/event/ngx_event_openssl.c:2678

13 0x0000000000442ba2 in ngx_dtls_read (b=0x20456f0, out=0x20bbfe3 "\026\376", <incomplete sequence \375>, outl=16717)

at src/event/ngx_event_openssl.c:6891

14 0x00007f2d2b4cd44b in bread_conv () from /lib64/libcrypto.so.3

15 0x00007f2d2b4d0445 in bio_read_intern () from /lib64/libcrypto.so.3

16 0x00007f2d2b4d05c7 in BIO_read () from /lib64/libcrypto.so.3

17 0x00007f2d2ba5ae9c in ssl3_read_n.part () from /lib64/libssl.so.3

18 0x00007f2d2ba5b872 in dtls1_read_bytes () from /lib64/libssl.so.3

19 0x00007f2d2ba367fc in ssl3_read_internal () from /lib64/libssl.so.3

20 0x00007f2d2ba3d137 in SSL_read () from /lib64/libssl.so.3

21 0x00000000004455ec in ngx_ssl_recv (c=0x204cd08, buf=0x20bbfe3 "\026\376", <incomplete sequence \375>, size=16717)

at src/event/ngx_event_openssl.c:2678

22 0x0000000000442ba2 in ngx_dtls_read (b=0x20456f0, out=0x20bbfe3 "\026\376", <incomplete sequence \375>, outl=16717)

at src/event/ngx_event_openssl.c:6891

23 0x00007f2d2b4cd44b in bread_conv () from /lib64/libcrypto.so.3

24 0x00007f2d2b4d0445 in bio_read_intern () from /lib64/libcrypto.so.3

25 0x00007f2d2b4d05c7 in BIO_read () from /lib64/libcrypto.so.3

26 0x00007f2d2ba5ae9c in ssl3_read_n.part () from /lib64/libssl.so.3

27 0x00007f2d2ba5b872 in dtls1_read_bytes () from /lib64/libssl.so.3

--Type for more, q to quit, c to continue without paging--

lianglli commented 1 month ago

error log设置 debug 级别,ssl建链失败的具体错误信息。

fitch2015 commented 1 month ago

配置: stream { server { listen 1443 reuseport ssl udp; ssl_protocols dtlsv1.2; ssl_certificate rsa..crt.pem; ssl_certificate_key rsa..key.pem; proxy_pass 192.168.101.101:5555; } }

补充debug日志: 2024/08/05 17:07:21 [info] 252613#0: 5 udp client 192.168.101.15:58830 connected to 0.0.0.0:1443 2024/08/05 17:07:21 [info] 252613#0: 5 udp proxy 192.168.101.25:42306 connected to 192.168.101.101:5555 2024/08/05 17:07:21 [notice] 237366#0: signal 17 (SIGCHLD) received from 252613 2024/08/05 17:07:21 [alert] 237366#0: worker process 252613 exited on signal 11 (core dumped) 2024/08/05 17:07:21 [notice] 237366#0: start worker process 252715

coredump(每次不同崩溃点): [New LWP 252715] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `nginx: worker'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00007ff4f11c0a88 in CRYPTO_THREAD_run_once () from /lib64/libcrypto.so.3

Missing separate debuginfos, use: dnf debuginfo-install libcap-2.48-9.el9_2.x86_64 libffi-3.4.2-8.el9.x86_64 libselinux-3.6-1.el9.x86_64 libxcrypt-4.4.18-3.el9.x86_64 openssl-libs-3.0.7-27.el9.0.2.x86_64 p11-kit-0.25.3-2.el9.x86_64 sssd-client-2.9.4-6.el9_4.1.x86_64 systemd-libs-252-32.el9_4.6.x86_64 zlib-1.2.11-40.el9.x86_64 (gdb) bt

0 0x00007ff4f11c0a88 in CRYPTO_THREAD_run_once () from /lib64/libcrypto.so.3

1 0x00007ff4f11b9e1a in OPENSSL_init_crypto () from /lib64/libcrypto.so.3

2 0x00007ff4f116a8d5 in ossl_err_get_state_int () from /lib64/libcrypto.so.3

3 0x00007ff4f116abd4 in get_error_values () from /lib64/libcrypto.so.3

4 0x00000000004450b5 in ngx_ssl_clear_error (log=0x225aaa0) at src/event/ngx_event_openssl.c:4216

5 0x00000000004455d2 in ngx_ssl_recv (c=0x228df58, buf=0x21ecfe3 "\026\376", <incomplete sequence \375>, size=16717)

at src/event/ngx_event_openssl.c:2669

6 0x0000000000442ba2 in ngx_dtls_read (b=0x2258ba0, out=0x21ecfe3 "\026\376", <incomplete sequence \375>, outl=16717)

at src/event/ngx_event_openssl.c:6891

7 0x00007ff4f10cd44b in bread_conv () from /lib64/libcrypto.so.3

8 0x00007ff4f10d0445 in bio_read_intern () from /lib64/libcrypto.so.3

9 0x00007ff4f10d05c7 in BIO_read () from /lib64/libcrypto.so.3

10 0x00007ff4f1655e9c in ssl3_read_n.part () from /lib64/libssl.so.3

11 0x00007ff4f1656872 in dtls1_read_bytes () from /lib64/libssl.so.3

12 0x00007ff4f16317fc in ssl3_read_internal () from /lib64/libssl.so.3

13 0x00007ff4f1638137 in SSL_read () from /lib64/libssl.so.3

14 0x00000000004455ec in ngx_ssl_recv (c=0x228df58, buf=0x21ecfe3 "\026\376", <incomplete sequence \375>, size=16717)

at src/event/ngx_event_openssl.c:2678

15 0x0000000000442ba2 in ngx_dtls_read (b=0x2258ba0, out=0x21ecfe3 "\026\376", <incomplete sequence \375>, outl=16717)

at src/event/ngx_event_openssl.c:6891

16 0x00007ff4f10cd44b in bread_conv () from /lib64/libcrypto.so.3

17 0x00007ff4f10d0445 in bio_read_intern () from /lib64/libcrypto.so.3

18 0x00007ff4f10d05c7 in BIO_read () from /lib64/libcrypto.so.3

19 0x00007ff4f1655e9c in ssl3_read_n.part () from /lib64/libssl.so.3

20 0x00007ff4f1656872 in dtls1_read_bytes () from /lib64/libssl.so.3

21 0x00007ff4f16317fc in ssl3_read_internal () from /lib64/libssl.so.3

22 0x00007ff4f1638137 in SSL_read () from /lib64/libssl.so.3

23 0x00000000004455ec in ngx_ssl_recv (c=0x228df58, buf=0x21ecfe3 "\026\376", <incomplete sequence \375>, size=16717)

at src/event/ngx_event_openssl.c:2678

24 0x0000000000442ba2 in ngx_dtls_read (b=0x2258ba0, out=0x21ecfe3 "\026\376", <incomplete sequence \375>, outl=16717)

at src/event/ngx_event_openssl.c:6891

25 0x00007ff4f10cd44b in bread_conv () from /lib64/libcrypto.so.3

26 0x00007ff4f10d0445 in bio_read_intern () from /lib64/libcrypto.so.3

27 0x00007ff4f10d05c7 in BIO_read () from /lib64/libcrypto.so.3

--Type for more, q to quit, c to continue without paging--debug

lianglli commented 1 month ago

从stack看ngx_dtls_read收包时有异常,打开 error log的debug 日志级别,看下具体的报错信息。

fitch2015 commented 1 month ago

备注:尝试多个环境、多个tengine版本(包括最初支持dtls的2.3.3版本)&多个openssl版本 均失败告终。dtls功能应该是一直不能使用,期待该bug解决和新版本。 补充(打开 error log的debug 日志级别): 2024/08/06 10:22:18 [debug] 711772#0: epoll: fd:7 ev:0001 d:0000000001F82FB8 2024/08/06 10:22:18 [debug] 711772#0: recvmsg on 0.0.0.0:1443, ready: 0 2024/08/06 10:22:18 [debug] 711772#0: posix_memalign: 0000000001F78200:256 @16 2024/08/06 10:22:18 [debug] 711772#0: posix_memalign: 0000000001F169D0:256 @16 2024/08/06 10:22:18 [debug] 711772#0: malloc: 0000000001F16AE0:294 2024/08/06 10:22:18 [debug] 711772#0: 3 recvmsg: 192.168.101.15:41329 fd:7 n:294 2024/08/06 10:22:18 [debug] 711772#0: add cleanup: 0000000001F782E0 2024/08/06 10:22:18 [debug] 711772#0: posix_memalign: 0000000001F16C10:256 @16 2024/08/06 10:22:18 [info] 711772#0: 3 udp client 192.168.101.15:41329 connected to 0.0.0.0:1443 2024/08/06 10:22:18 [debug] 711772#0: 3 posix_memalign: 0000000001F16D20:256 @16 2024/08/06 10:22:18 [debug] 711772#0: 3 generic phase: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 generic phase: 1 2024/08/06 10:22:18 [debug] 711772#0: 3 generic phase: 2 2024/08/06 10:22:18 [debug] 711772#0: 3 generic phase: 3 2024/08/06 10:22:18 [debug] 711772#0: 3 posix_memalign: 0000000001F16E30:256 @16 2024/08/06 10:22:18 [debug] 711772#0: 3 posix_memalign: 0000000001F0EB50:256 @16 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 48 of 48 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_do_handshake: -1 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_get_error: 2 2024/08/06 10:22:18 [debug] 711772#0: 3 event timer add: 7: 60000:88575578 2024/08/06 10:22:18 [debug] 711772#0: timer delta: 55650 2024/08/06 10:22:18 [debug] 711772#0: worker cycle 2024/08/06 10:22:18 [debug] 711772#0: epoll timer: 60000 2024/08/06 10:22:18 [debug] 711772#0: epoll: fd:7 ev:0001 d:0000000001F82FB8 2024/08/06 10:22:18 [debug] 711772#0: recvmsg on 0.0.0.0:1443, ready: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 recvmsg: fd:7 n:314 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL handshake handler: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 256 of 256 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 256 of 256 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 256 of 256 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 256 of 256 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 256 of 256 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 256 of 256 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 103 of 103 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_do_handshake: -1 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_get_error: 2 2024/08/06 10:22:18 [debug] 711772#0: 3 event timer add: 7: 999:88516660 2024/08/06 10:22:18 [debug] 711772#0: timer delta: 83 2024/08/06 10:22:18 [debug] 711772#0: worker cycle 2024/08/06 10:22:18 [debug] 711772#0: epoll timer: 999 2024/08/06 10:22:18 [debug] 711772#0: epoll: fd:7 ev:0001 d:0000000001F82FB8 2024/08/06 10:22:18 [debug] 711772#0: recvmsg on 0.0.0.0:1443, ready: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 recvmsg: fd:7 n:91 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL handshake handler: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_do_handshake: -1 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_get_error: 2 2024/08/06 10:22:18 [debug] 711772#0: 3 event timer: 7, old: 88516660, new: 88516665 2024/08/06 10:22:18 [debug] 711772#0: timer delta: 12 2024/08/06 10:22:18 [debug] 711772#0: worker cycle 2024/08/06 10:22:18 [debug] 711772#0: epoll timer: 987 2024/08/06 10:22:18 [debug] 711772#0: epoll: fd:7 ev:0001 d:0000000001F82FB8 2024/08/06 10:22:18 [debug] 711772#0: recvmsg on 0.0.0.0:1443, ready: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 recvmsg: fd:7 n:14 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL handshake handler: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_do_handshake: -1 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_get_error: 2 2024/08/06 10:22:18 [debug] 711772#0: 3 event timer: 7, old: 88516660, new: 88516666 2024/08/06 10:22:18 [debug] 711772#0: timer delta: 1 2024/08/06 10:22:18 [debug] 711772#0: worker cycle 2024/08/06 10:22:18 [debug] 711772#0: epoll timer: 986 2024/08/06 10:22:18 [debug] 711772#0: epoll: fd:7 ev:0001 d:0000000001F82FB8 2024/08/06 10:22:18 [debug] 711772#0: recvmsg on 0.0.0.0:1443, ready: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 recvmsg: fd:7 n:61 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL handshake handler: 0 2024/08/06 10:22:18 [debug] 711772#0: shmtx lock 2024/08/06 10:22:18 [debug] 711772#0: shmtx unlock 2024/08/06 10:22:18 [debug] 711772#0: 3 ssl ticket encrypt, key: "1c80d9ca7fd062cd731994a381cfe676" (new session) 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 280 of 280 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 sendto: fd:7 51 of 51 to "192.168.101.15" 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL_do_handshake: 1 2024/08/06 10:22:18 [debug] 711772#0: 3 event timer del: 7: 88516660 2024/08/06 10:22:18 [debug] 711772#0: 3 SSL: DTLSv1.2, cipher: "ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH Au=RSA Enc=AESGCM(256) Mac=AEAD" 2024/08/06 10:22:18 [debug] 711772#0: 3 event timer del: 7: 88575578 2024/08/06 10:22:18 [debug] 711772#0: 3 generic phase: 3 2024/08/06 10:22:18 [debug] 711772#0: 3 proxy connection handler 2024/08/06 10:22:18 [debug] 711772#0: 3 malloc: 0000000001F77710:432 2024/08/06 10:22:18 [debug] 711772#0: 3 malloc: 0000000002004B10:16384 2024/08/06 10:22:18 [debug] 711772#0: 3 post event 0000000001FBD000 2024/08/06 10:22:18 [debug] 711772#0: 3 get rr peer, try: 1 2024/08/06 10:22:18 [debug] 711772#0: 3 dgram socket 3 2024/08/06 10:22:18 [debug] 711772#0: 3 epoll add connection: fd:3 ev:80002005 2024/08/06 10:22:18 [debug] 711772#0: 3 connect to 192.168.101.101:5555, fd:3 #4 2024/08/06 10:22:18 [debug] 711772#0: 3 connected 2024/08/06 10:22:18 [debug] 711772#0: 3 proxy connect: 0 2024/08/06 10:22:18 [info] 711772#0: 3 udp proxy 192.168.101.25:54603 connected to 192.168.101.101:5555 2024/08/06 10:22:18 [debug] 711772#0: 3 malloc: 0000000002008B20:16384 2024/08/06 10:22:18 [debug] 711772#0: 3 stream proxy add preread buffer: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 write new buf t:0 f:0 0000000001F16AE0, pos 0000000001F16C06, size: 0 file: 0, size: 0 2024/08/06 10:22:18 [debug] 711772#0: 3 stream write filter: l:0 f:1 s:0 2024/08/06 10:22:18 [debug] 711772#0: 3 sendmsg: 0 of 0 2024/08/06 10:22:18 [debug] 711772#0: 3 stream write filter 0000000000000000 2024/08/06 10:22:18 [notice] 711324#0: signal 17 (SIGCHLD) received from 711772 2024/08/06 10:22:18 [alert] 711324#0: worker process 711772 exited on signal 11 (core dumped) 2024/08/06 10:22:18 [debug] 711324#0: shmtx forced unlock 2024/08/06 10:22:18 [debug] 711324#0: shmtx forced unlock 2024/08/06 10:22:18 [debug] 711324#0: wake up, sigio 0 2024/08/06 10:22:18 [debug] 711324#0: reap children 2024/08/06 10:22:18 [debug] 711324#0: child: 0 711772 e:0 t:1 d:0 r:1 j:0 2024/08/06 10:22:18 [debug] 711324#0: channel 3:8 2024/08/06 10:22:18 [notice] 711324#0: start worker process 712092 2024/08/06 10:22:18 [debug] 711324#0: sigsuspend 2024/08/06 10:22:18 [debug] 712092#0: add cleanup: 0000000001F39F58 2024/08/06 10:22:18 [debug] 712092#0: malloc: 0000000001F80DB0:8 2024/08/06 10:22:18 [debug] 712092#0: notify eventfd: 10 2024/08/06 10:22:18 [debug] 712092#0: testing the EPOLLRDHUP flag: success 2024/08/06 10:22:18 [debug] 712092#0: malloc: 0000000001F0AFE0:6144 2024/08/06 10:22:18 [debug] 712092#0: malloc: 0000000001F82ED0:237568 2024/08/06 10:22:18 [debug] 712092#0: malloc: 0000000001FBCEE0:98304 2024/08/06 10:22:18 [debug] 712092#0: malloc: 0000000001FD4EF0:98304 2024/08/06 10:22:18 [debug] 712092#0: epoll add event: fd:6 op:1 ev:00002001 2024/08/06 10:22:18 [debug] 712092#0: epoll add event: fd:7 op:1 ev:00002001 2024/08/06 10:22:18 [debug] 712092#0: epoll add event: fd:8 op:1 ev:00002001 2024/08/06 10:22:18 [debug] 712092#0: setproctitle: "nginx: worker process" 2024/08/06 10:22:18 [debug] 712092#0: worker cycle 2024/08/06 10:22:18 [debug] 712092#0: epoll timer: -1