alibaba / tengine

A distribution of Nginx with some advanced features
https://tengine.taobao.org
BSD 2-Clause "Simplified" License
12.65k stars 2.52k forks source link

kernel: dh895xcc 0000:60:00.0: Process nginx exit with orphan rings #1930

Closed lastpepole closed 4 weeks ago

lastpepole commented 2 months ago

Ⅰ. Issue Description

QAT_Engine-1.5.0 ,QAT驱动:QAT.L.4.23.0-00001,OpenSSL 1.1.1w

./sbin/nginx -c ./conf/nginx.conf

pstack 19641 Thread 2 (Thread 0x7f37d49b5700 (LWP 19642)):

0 0x00007f37d7ee34ed in __lll_lock_wait () from /lib64/libpthread.so.0

1 0x00007f37d7ededcb in _L_lock_883 () from /lib64/libpthread.so.0

2 0x00007f37d7edec98 in pthread_mutex_lock () from /lib64/libpthread.so.0

3 0x00007f37d68806cf in qat_timer_poll_func (ih=) at qat_hw_polling.c:152

4 0x00007f37d7edcdd5 in start_thread () from /lib64/libpthread.so.0

5 0x00007f37d6dd8ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f37d8503740 (LWP 19641)):

0 0x00007f37d7ee34ed in __lll_lock_wait () from /lib64/libpthread.so.0

1 0x00007f37d7ee0a42 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

2 0x00007f37d687f893 in qat_hw_init (e=e@entry=0x16fdc20) at qat_hw_init.c:770

3 0x00007f37d687c6c0 in qat_engine_init (e=e@entry=0x16fdc20) at e_qat.c:603

4 0x00007f37d687d250 in engine_init_child_at_fork_handler () at qat_fork.c:108

5 0x00007f37d6da007e in fork () from /lib64/libc.so.6

6 0x000000000043918e in ngx_daemon (log=0x1713d38) at src/os/unix/ngx_daemon.c:17

7 0x0000000000413563 in main (argc=, argv=) at src/core/nginx.c:378

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

1. 2. 3.

Ⅴ. Anything else we need to know?

  1. If applicable, add nginx debug log doc.

Ⅵ. Environment:

lianglli commented 2 months ago

这个stack不完整,退出的信号量?

还有exit具体指的是core dump ?

如果是core了 提供完整的stack trace以及debug级别的error log。

lastpepole commented 2 months ago

这个stack不完整,退出的信号量?

还有exit具体指的是core dump ?

如果是core了 提供完整的stack trace以及debug级别的error log。

没有core, 卡在__lll_lock_wait函数这里了,并且/var/log/messages出现nginx exit with orphan rings日志。

lastpepole commented 2 months ago

@lianglli 上面这个问题可稳定复现,启动tengine就会卡主。辛苦帮忙看下或者复现哪里异常导致的?

lianglli commented 2 months ago

参考一下:Ice Lake SSL/TLS加速实践 https://openanolis.cn/sig/crypto/doc/390714951012679780

lastpepole commented 2 months ago

参考一下:Ice Lake SSL/TLS加速实践 https://openanolis.cn/sig/crypto/doc/390714951012679780

@lianglli 卡在问题看着是qat engine代码问题。辛苦看下https://github.com/alibaba/tengine/issues/1932 这个问题。