Open dagmoller opened 3 years ago
Hi,
The only way to see why it crashes is to get the core dump files
You have to add this in your main nginx.conf
makdir -p /tmp/cores
worker_rlimit_core 500M;
working_directory /tmp/cores/;
Then run gdb to obtain backtrace as usual, e.g.
gdb /path/to/nginx /tmp/nginx.core.PID
backtrace full
It will help if you can share your current nginx configuration (only the relevant parts) and some load statistics.
Thanks g.
Hi,
Don't know if it can help. The load is very low. I have a lot of reverse proxys with ntlm enabled on this instance.
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
435 ngx_queue_insert_head(&conf->free, &item->queue);
Missing separate debuginfos, use: yum debuginfo-install nginx-1.20.1-1.el8.ngx.x86_64
(gdb) backtrace full
#0 ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
conf = 0x5556a19cedd0
item = 0x5556a19ceec0
n = <optimized out>
buf = ""
c = 0x5556a19cf910
#1 0x000055569ec373e0 in ngx_event_process_posted ()
No symbol table info available.
#2 0x000055569ec3edf9 in ngx_worker_process_cycle ()
No symbol table info available.
#3 0x000055569ec3d41f in ngx_spawn_process ()
No symbol table info available.
#4 0x000055569ec3fbc7 in ngx_master_process_cycle ()
No symbol table info available.
#5 0x000055569ec15c78 in main ()
No symbol table info available.
Hi,
Thanks. This definitely helps. I will try to reproduce it and then fix it.
g.
I think your code contains a logical error on the following lines:
static void ngx_http_upstream_client_conn_cleanup(void data) { ngx_http_upstream_ntlm_cache_t item = data;
ngx_log_debug2(
NGX_LOG_DEBUG_HTTP, ngx_cycle->log, 0,
"ntlm client connection closed %p, droping peer connection %p",
item->client_connection, item->peer_connection);
// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {
item->peer_connection->read->timedout = 1;
ngx_post_event(item->peer_connection->read,&ngx_posted_events);
ngx_queue_remove(&item->queue);
ngx_queue_insert_head(&item->conf->free, &item->queue);
}
}
If the connection is already deleted, then calling the function ngx_log_debug2 results in segfault. Because the item array might already be destroyed.
@dbalaev Thanks for pointing this. Do you have a way to reproduce the segfault? This will help me a lot. Thanks
Build CENTOS 7 + NGINX 1.23.1-1
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `nginx: worker process '. Program terminated with signal 11, Segmentation fault.
1278 ngx_http_set_log_request(c->log, r); (gdb) bt
name=name@entry=0x56228ad3dc27 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
(gdb) f 0
1278 ngx_http_set_log_request(c->log, r);
same segfault on ubuntu 22 with nginx/1.22.0 I can reproduce the problem. It occurs every 1-2 minutes on my server, which has 10-20 people working through it.
https://gist.github.com/klp18/a9eb7118e951528e3c9da65851a18d45
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `nginx: worker process '. Program terminated with signal SIGSEGV, Segmentation fault.
0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282
1282 ngx_http_set_log_request(c->log, r); (gdb) bt
0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282
1 0x0000564b3c358a2e in ngx_event_process_posted (cycle=cycle@entry=0x564b3dc93490, posted=0x564b3c46cd10
) at src/event/ngx_event_posted.c:34
2 0x0000564b3c3584e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b3dc93490) at src/event/ngx_event.c:263
3 0x0000564b3c362141 in ngx_worker_process_cycle (cycle=0x564b3dc93490, data=
) at src/os/unix/ngx_process_cycle.c:721 4 0x0000564b3c36057d in ngx_spawn_process (cycle=cycle@entry=0x564b3dc93490, proc=0x564b3c36202a
, data=0x0, name=0x564b3c416612 "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
5 0x0000564b3c362dba in ngx_reap_children (cycle=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:598
6 ngx_master_process_cycle (cycle=cycle@entry=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:174
7 0x0000564b3c33680d in main (argc=
, argv= ) at src/core/nginx.c:383 (gdb) exit
Thanks @klp18 for reporting this. Can you share you nginx configuration (replace any sensitive information). Thank. you, g.
@gabihodoroaga thank's for this module! Did you manage to reproduce the segfaults? It's also happening on my instance (nginx/1.18.0) and shutting down worker proccesses every few minutes (often multiple times per minute).
Let me know if you need further information or configuration files for comparison.
@klp18 would you mind sharing a binary core dump file ? I'm trying to reproduce the issue in lab with no luck yet. Backtrace is useful but I'd like to go further into the investigation.
Thanks @klp18! I will have a look
@klp18 Can you provide me with the output of "nginx -V" command, please?
nginx version: nginx/1.22.0
built by gcc 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
built with OpenSSL 3.0.2 15 Mar 2022
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module --with-mail --with-stream --with-stream_ssl_module --with-mail_ssl_module --add-dynamic-module=../nginx-ntlm-module
@klp18 I may have found a fix but can't reproduce the issue locally. Would you mind trying the following patch and see if it fixes ?
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..90ed158 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -380,6 +380,8 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
if (item->peer_connection != NULL) {
item->peer_connection->read->timedout = 1;
+ // If connection is already closed, log will be empty and will generate a segfault
+ item->peer_connection->log = ngx_cycle->log;
ngx_post_event(item->peer_connection->read,&ngx_posted_events);
ngx_queue_remove(&item->queue);
@tmagnien the same crash with the same and new back traces new dumps
#0 0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
2394 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
#1 0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3 0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4 0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=proc@entry=0x55c199a2402a <ngx_worker_process_cycle>, data=data@entry=0x3,
name=name@entry=0x55c199ad861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5 0x000055c199a23859 in ngx_start_worker_processes (cycle=cycle@entry=0x55c19a273490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6 0x000055c199a24a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:130
#7 0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
#0 0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
437 ngx_queue_insert_head(&conf->free, &item->queue);
(gdb) bt
#0 0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
#1 0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3 0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4 0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x2,
name=0x55c199ad861a "worker process", respawn=respawn@entry=2) at src/os/unix/ngx_process.c:199
#5 0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6 ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7 0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
#0 0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
1282 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
#1 0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3 0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4 0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x1,
name=0x55c199ad861a "worker process", respawn=respawn@entry=1) at src/os/unix/ngx_process.c:199
#5 0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6 ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7 0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
@klp18 Sorry to bother again but I can't reproduce it so would you mind replacing previous patch with this one:
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6066980 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -379,8 +379,9 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {
- item->peer_connection->read->timedout = 1;
- ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+ ngx_http_upstream_ntlm_close(item->peer_connection);
+ /*item->peer_connection->read->timedout = 1;
+ ngx_post_event(item->peer_connection->read,&ngx_posted_events);*/
ngx_queue_remove(&item->queue);
ngx_queue_insert_head(&item->conf->free, &item->queue);
If segfault still occurs, it would help me to also have nginx debug logs. Thanks a lot!
@tmagnien new dumps new back trace
#0 0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
69 ngx_log_debug1(NGX_LOG_DEBUG_ALLOC, pool->log, 0, "free: %p", l->alloc);
(gdb) bt
#0 0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
#1 0x00007f2fe95ec767 in ngx_http_upstream_ntlm_close (c=0x55f6eede0720) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:456
#2 0x00007f2fe95ec7b9 in ngx_http_upstream_client_conn_cleanup (data=0x55f6eecc7670) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:382
#3 0x000055f6ecf3f2ac in ngx_destroy_pool (pool=pool@entry=0x55f6eeea3610) at src/core/ngx_palloc.c:57
#4 0x000055f6ecf84942 in ngx_http_close_connection (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:3787
#5 0x000055f6ecf859a2 in ngx_http_close_request (r=r@entry=0x55f6eeef4080, rc=rc@entry=0) at src/http/ngx_http_request.c:3646
#6 0x000055f6ecf85c8a in ngx_http_terminate_handler (r=0x55f6eeef4080) at src/http/ngx_http_request.c:2723
#7 0x000055f6ecf85529 in ngx_http_run_posted_requests (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:2450
#8 0x000055f6ecf85bf2 in ngx_http_request_handler (ev=0x55f6eee0ce90) at src/http/ngx_http_request.c:2418
#9 0x000055f6ecf6b5d4 in ngx_epoll_process_events (cycle=0x55f6eec2d490, timer=<optimized out>, flags=<optimized out>)
at src/event/modules/ngx_epoll_module.c:901
#10 0x000055f6ecf5f4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x55f6eec2d490) at src/event/ngx_event.c:248
#11 0x000055f6ecf69141 in ngx_worker_process_cycle (cycle=0x55f6eec2d490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#12 0x000055f6ecf6757d in ngx_spawn_process (cycle=cycle@entry=0x55f6eec2d490, proc=proc@entry=0x55f6ecf6902a <ngx_worker_process_cycle>,
data=data@entry=0x0, name=name@entry=0x55f6ed01d61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#13 0x000055f6ecf68859 in ngx_start_worker_processes (cycle=cycle@entry=0x55f6eec2d490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#14 0x000055f6ecf69a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55f6eec2d490) at src/os/unix/ngx_process_cycle.c:130
#15 0x000055f6ecf3d80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
Hi again @klp18. Can you try the following patch, please:
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..88ff463 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {
-
- item->peer_connection->read->timedout = 1;
- ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+ ngx_http_upstream_ntlm_close(item->peer_connection);
ngx_queue_remove(&item->queue);
ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {
#endif
+ // If connection is already closed, don't do anything (pool is already gone)
+ if (c->fd == (ngx_socket_t) -1) {
+ return;
+ }
+
if (c->pool) {
ngx_destroy_pool(c->pool);
c->pool = NULL;
And again, sorry for those multiple tries, but I still don't have any way to reproduce the issue, so I'm working blind.
I have no problem with doing such tests, so ask away. I wish I had enough time right now to try to figure out the problem myself.
Here's another thing I noticed. The previous dumps were accompanied with the following entries in the nginx log file:
2022/11/15 14:10:11 [error] 129235#129235: *11975 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.141, server:
2022/11/15 14:11:15 [error] 129237#129237: *11997 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.88, server:
and the new ones with the following:
2022/11/15 18:22:31 [alert] 134292#134292: *439 connection already closed while reading upstream, client: 192.168.4.193, server:
2022/11/15 18:23:07 [alert] 134292#134292: *437 connection already closed while reading upstream, client: 192.168.4.18, server:
new back trace:
[New LWP 134509]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
2435 pr = r->main->posted_requests;
(gdb) bt
#0 0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
#1 0x0000564b6e5f3d7d in ngx_http_upstream_handler (ev=0x564b6eca6650) at src/http/ngx_http_upstream.c:1299
#2 0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#3 0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#4 0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#5 0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=0x564b6e5c402a <ngx_worker_process_cycle>, data=0x0,
name=0x564b6e67861a "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#6 0x0000564b6e5c4dba in ngx_reap_children (cycle=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:598
#7 ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:174
#8 0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
and another one:
[New LWP 134292]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140549354153792, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007fd432585476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007fd43256b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007fd4325cc6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fd43271eb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6 0x00007fd4325e3d7c in malloc_printerr (str=str@entry=0x7fd432721710 "free(): double free detected in tcache 2") at ./malloc/malloc.c:5664
#7 0x00007fd4325e612b in _int_free (av=0x7fd43275cc80 <main_arena>, p=0x564b6ecddfc0, have_lock=0) at ./malloc/malloc.c:4473
#8 0x00007fd4325e84d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9 0x0000564b6e59a38d in ngx_destroy_pool (pool=0x564b6edb3190) at src/core/ngx_palloc.c:85
#10 0x0000564b6e5f37fd in ngx_http_upstream_finalize_request (r=r@entry=0x564b6ed6e490, u=u@entry=0x564b6ed93600, rc=rc@entry=499)
at src/http/ngx_http_upstream.c:4513
#11 0x0000564b6e5f3f65 in ngx_http_upstream_check_broken_connection (r=0x564b6ed6e490, ev=0x564b6eca6890) at src/http/ngx_http_upstream.c:1433
#12 0x0000564b6e5f41c8 in ngx_http_upstream_rd_check_broken_connection (r=<optimized out>) at src/http/ngx_http_upstream.c:1306
#13 0x0000564b6e5e0c59 in ngx_http_request_handler (ev=0x564b6eca6890) at src/http/ngx_http_request.c:2415
#14 0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#15 0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#16 0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#17 0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=proc@entry=0x564b6e5c402a <ngx_worker_process_cycle>, data=data@entry=0x2,
name=name@entry=0x564b6e67861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x0000564b6e5c3859 in ngx_start_worker_processes (cycle=cycle@entry=0x564b6eac7490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#19 0x0000564b6e5c4a6e in ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:130
#20 0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
Thanks a lot @klp18 . Do you think you could provide a full Nginx debug log ? This would help me understanding the sequence of events.
And in the meantime, please try this patch:
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..9de776f 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {
-
- item->peer_connection->read->timedout = 1;
- ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+ ngx_add_timer(item->peer_connection->read, 1);
ngx_queue_remove(&item->queue);
ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {
#endif
+ // If connection is already closed, don't do anything (pool is already gone)
+ if (c->fd == (ngx_socket_t) -1) {
+ return;
+ }
+
if (c->pool) {
ngx_destroy_pool(c->pool);
c->pool = NULL;
new dumps with debug logs with the new patch applied
Thanks a lot @klp18 , it was helpful. Can you try the following patch:
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6b392de 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
ngx_http_upstream_t *u;
ngx_pool_cleanup_t *cln;
ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+ unsigned invalid = 0;
/* cache valid connections */
@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
c->read->error || c->read->timedout || c->write->error ||
c->write->timedout) {
+ invalid = 1;
goto invalid;
}
if (!u->keepalive) {
+ invalid = 1;
goto invalid;
}
if (!u->request_body_sent) {
+ invalid = 1;
goto invalid;
}
if (ngx_terminate || ngx_exiting) {
+ invalid = 1;
goto invalid;
}
if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+ invalid = 1;
goto invalid;
}
if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+ invalid = 1;
goto invalid;
}
@@ -365,6 +372,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
}
invalid:
+ if (invalid) {
+ // remove the client connection drop down handler if present
+ for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+ if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+ ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+ "ntlm free peer removing cleanup");
+ cln->handler = NULL;
+ }
+ }
+ }
+
hndp->original_free_peer(pc, hndp->data, state);
}
crashes are much less frequent now. new dumps and debug logs
@klp18 I think I spotted the root cause, can you please try this patch:
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..981dcb0 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
ngx_http_upstream_t *u;
ngx_pool_cleanup_t *cln;
ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+ unsigned invalid = 0;
/* cache valid connections */
@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
c->read->error || c->read->timedout || c->write->error ||
c->write->timedout) {
+ invalid = 1;
goto invalid;
}
if (!u->keepalive) {
+ invalid = 1;
goto invalid;
}
if (!u->request_body_sent) {
+ invalid = 1;
goto invalid;
}
if (ngx_terminate || ngx_exiting) {
+ invalid = 1;
goto invalid;
}
if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+ invalid = 1;
goto invalid;
}
if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+ invalid = 1;
goto invalid;
}
@@ -327,6 +334,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
cleanup_item = cln->data;
+ // reassign cleanup data in case it has moved
+ cln->data = item;
break;
}
}
@@ -365,6 +374,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
}
invalid:
+ if (invalid) {
+ // remove the client connection drop down handler if present
+ for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+ if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+ ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+ "ntlm free peer removing cleanup");
+ cln->handler = NULL;
+ }
+ }
+ }
+
hndp->original_free_peer(pc, hndp->data, state);
}
@tmagnien Not a single crash in three hours. I'll post an update tomorrow.
@klp18 great news ! If you confirm tomorrow I'll create a PR with the patch.
@tmagnien Four crashes happened overnight. One of them (180266) contains ngx_http_upstream_ntlm_close_handler in backtrace. I turned off the debug logs yesterday. I can turn it back on and wait for a new crash.
@klp18 Thanks for the core dumps, and yes, if you can enable debug it will help a lot.
@klp18 updated patch to handle last (I hope) edge case:
diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..d68c2b3 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
ngx_http_upstream_t *u;
ngx_pool_cleanup_t *cln;
ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+ unsigned invalid = 0;
/* cache valid connections */
@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
c->read->error || c->read->timedout || c->write->error ||
c->write->timedout) {
+ invalid = 1;
goto invalid;
}
if (!u->keepalive) {
+ invalid = 1;
goto invalid;
}
if (!u->request_body_sent) {
+ invalid = 1;
goto invalid;
}
if (ngx_terminate || ngx_exiting) {
+ invalid = 1;
goto invalid;
}
if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+ invalid = 1;
goto invalid;
}
if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+ invalid = 1;
goto invalid;
}
@@ -306,6 +313,14 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
item = ngx_queue_data(q, ngx_http_upstream_ntlm_cache_t, queue);
ngx_http_upstream_ntlm_close(item->peer_connection);
+ // remove the client connection drop down handler if present
+ for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
+ if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+ ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+ "ntlm free peer removing cleanup");
+ cln->handler = NULL;
+ }
+ }
item->peer_connection = NULL;
} else {
q = ngx_queue_head(&hndp->conf->free);
@@ -327,6 +342,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
cleanup_item = cln->data;
+ // reassign cleanup data in case it has moved
+ cln->data = item;
break;
}
}
@@ -365,6 +382,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
}
invalid:
+ if (invalid) {
+ // remove the client connection drop down handler if present
+ for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+ if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+ ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+ "ntlm free peer removing cleanup");
+ cln->handler = NULL;
+ }
+ }
+ }
+
hndp->original_free_peer(pc, hndp->data, state);
}
There was one crash in last 16 hours though. Now I will restart nginx with debug logs and try to catch it again.
#0 0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
1282 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
#1 0x0000558ee696ea2e in ngx_event_process_posted (cycle=cycle@entry=0x558ee88f1490, posted=0x558ee6a82d10 <ngx_posted_events>)
at src/event/ngx_event_posted.c:34
#2 0x0000558ee696e4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x558ee88f1490) at src/event/ngx_event.c:263
#3 0x0000558ee6978141 in ngx_worker_process_cycle (cycle=0x558ee88f1490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4 0x0000558ee697657d in ngx_spawn_process (cycle=cycle@entry=0x558ee88f1490, proc=proc@entry=0x558ee697802a <ngx_worker_process_cycle>,
data=data@entry=0x0, name=name@entry=0x558ee6a2c61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5 0x0000558ee6977859 in ngx_start_worker_processes (cycle=cycle@entry=0x558ee88f1490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6 0x0000558ee6978a6e in ngx_master_process_cycle (cycle=cycle@entry=0x558ee88f1490) at src/os/unix/ngx_process_cycle.c:130
#7 0x0000558ee694c80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
@tmagnien dumps with debug logs
@klp18 Thanks! Just one question : from the logs I feel there's a internal proxy configuration on your setup, something like client -> nginx -> nginx -> server, is this right ?
@tmagnien Good question. Internal clients only work through one nginx with ntlm-module. External clients work through another additional "proxy" - TMG 2010 (which I plan to get rid of if I can get nginx to work stable). Thus, the requests look like client -> router (NAT1) -> tmg2010 (NAT2) -> nginx -> server.
Publishing through TMG 2010 has its own specific features. For example, when it proxies an http/https request, it can preserve the client's original ip address at the tcp level (using the built-in NAT, I guess).
So it turns out that I don't have an extra proxy in front of nginx, but a double NAT.
Hi @klp18 . I finally have some time to work in this again. Would you have the possibility to make the same tests with keepalive disabled ? I mean "keepalive_timeout 0;" in nginx config file. It would help me confirm that this is where the issue resides. Thanks a lot !
Hello @tmagnien, when I set keepalive_timeout to zero, authentication stops working. Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log. Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state. And it's unlikely to crash if clients can't authenticate.
Ok no problem, thanks for trying.
Le mar. 7 févr. 2023, 20:37, klp18 @.***> a écrit :
Hello @tmagnien https://github.com/tmagnien, when I set keepalive_timeout to zero, authentication stops working. Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log. Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state. And it's unlikely to crash if clients can't authenticate.
— Reply to this email directly, view it on GitHub https://github.com/gabihodoroaga/nginx-ntlm-module/issues/5#issuecomment-1421340425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJMVWF4D3KVCU57AVTQWKLWWKP7TANCNFSM5BAHGO4A . You are receiving this because you were mentioned.Message ID: @.***>
Hi,
I'm having some segfaults here. nginx version: nginx/1.20.1 Please help me to get some debug info for you.