gabihodoroaga / nginx-ntlm-module

A nginx module to allow proxying requests with NTLM Authentication.
MIT License
91 stars 22 forks source link

Some segfaults #5

Open dagmoller opened 3 years ago

dagmoller commented 3 years ago

Hi,

I'm having some segfaults here. nginx version: nginx/1.20.1 Please help me to get some debug info for you.

gabihodoroaga commented 3 years ago

Hi,

The only way to see why it crashes is to get the core dump files

https://docs.nginx.com/nginx/admin-guide/monitoring/debugging/#enabling-core-dumps-in-nginx-configuration

You have to add this in your main nginx.conf

makdir -p /tmp/cores
worker_rlimit_core  500M;
working_directory   /tmp/cores/;

Then run gdb to obtain backtrace as usual, e.g.

gdb /path/to/nginx /tmp/nginx.core.PID
backtrace full

It will help if you can share your current nginx configuration (only the relevant parts) and some load statistics.

Thanks g.

dagmoller commented 3 years ago

Hi,

Don't know if it can help. The load is very low. I have a lot of reverse proxys with ntlm enabled on this instance.

Core was generated by `nginx: worker process                   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
435         ngx_queue_insert_head(&conf->free, &item->queue);
Missing separate debuginfos, use: yum debuginfo-install nginx-1.20.1-1.el8.ngx.x86_64
(gdb) backtrace full
#0  ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
        conf = 0x5556a19cedd0
        item = 0x5556a19ceec0
        n = <optimized out>
        buf = ""
        c = 0x5556a19cf910
#1  0x000055569ec373e0 in ngx_event_process_posted ()
No symbol table info available.
#2  0x000055569ec3edf9 in ngx_worker_process_cycle ()
No symbol table info available.
#3  0x000055569ec3d41f in ngx_spawn_process ()
No symbol table info available.
#4  0x000055569ec3fbc7 in ngx_master_process_cycle ()
No symbol table info available.
#5  0x000055569ec15c78 in main ()
No symbol table info available.
gabihodoroaga commented 3 years ago

Hi,

Thanks. This definitely helps. I will try to reproduce it and then fix it.

g.

dbalaev commented 2 years ago

I think your code contains a logical error on the following lines:

static void ngx_http_upstream_client_conn_cleanup(void data) { ngx_http_upstream_ntlm_cache_t item = data;

ngx_log_debug2(
    NGX_LOG_DEBUG_HTTP, ngx_cycle->log, 0,
    "ntlm client connection closed %p, droping peer connection %p",
    item->client_connection, item->peer_connection);

// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {

    item->peer_connection->read->timedout = 1;
    ngx_post_event(item->peer_connection->read,&ngx_posted_events);

    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&item->conf->free, &item->queue);
}

}

If the connection is already deleted, then calling the function ngx_log_debug2 results in segfault. Because the item array might already be destroyed.

gabihodoroaga commented 2 years ago

@dbalaev Thanks for pointing this. Do you have a way to reproduce the segfault? This will help me a lot. Thanks

dbalaev commented 2 years ago

Build CENTOS 7 + NGINX 1.23.1-1

[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `nginx: worker process '. Program terminated with signal 11, Segmentation fault.

0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278

1278 ngx_http_set_log_request(c->log, r); (gdb) bt

0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278

1 0x000056228ac8bb80 in ngx_event_process_posted (cycle=cycle@entry=0x56228c9ccb50, posted=0x56228afa3930 ) at src/event/ngx_event_posted.c:34

2 0x000056228ac8b72c in ngx_process_events_and_timers (cycle=cycle@entry=0x56228c9ccb50) at src/event/ngx_event.c:263

3 0x000056228ac93461 in ngx_worker_process_cycle (cycle=cycle@entry=0x56228c9ccb50, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:721

4 0x000056228ac91ccb in ngx_spawn_process (cycle=cycle@entry=0x56228c9ccb50, proc=proc@entry=0x56228ac933e0 , data=data@entry=0x0,

name=name@entry=0x56228ad3dc27 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199

5 0x000056228ac93850 in ngx_start_worker_processes (cycle=cycle@entry=0x56228c9ccb50, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344

6 0x000056228ac940b3 in ngx_master_process_cycle (cycle=cycle@entry=0x56228c9ccb50) at src/os/unix/ngx_process_cycle.c:130

7 0x000056228ac6a85d in main (argc=, argv=) at src/core/nginx.c:383

(gdb) f 0

0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278

1278 ngx_http_set_log_request(c->log, r);

klp18 commented 2 years ago

same segfault on ubuntu 22 with nginx/1.22.0 I can reproduce the problem. It occurs every 1-2 minutes on my server, which has 10-20 people working through it.

https://gist.github.com/klp18/a9eb7118e951528e3c9da65851a18d45

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `nginx: worker process '. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282

1282 ngx_http_set_log_request(c->log, r); (gdb) bt

0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282

1 0x0000564b3c358a2e in ngx_event_process_posted (cycle=cycle@entry=0x564b3dc93490, posted=0x564b3c46cd10 )

at src/event/ngx_event_posted.c:34

2 0x0000564b3c3584e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b3dc93490) at src/event/ngx_event.c:263

3 0x0000564b3c362141 in ngx_worker_process_cycle (cycle=0x564b3dc93490, data=) at src/os/unix/ngx_process_cycle.c:721

4 0x0000564b3c36057d in ngx_spawn_process (cycle=cycle@entry=0x564b3dc93490, proc=0x564b3c36202a , data=0x0,

name=0x564b3c416612 "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199

5 0x0000564b3c362dba in ngx_reap_children (cycle=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:598

6 ngx_master_process_cycle (cycle=cycle@entry=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:174

7 0x0000564b3c33680d in main (argc=, argv=) at src/core/nginx.c:383

(gdb) exit

gabihodoroaga commented 2 years ago

Thanks @klp18 for reporting this. Can you share you nginx configuration (replace any sensitive information). Thank. you, g.

klp18 commented 2 years ago

https://gist.github.com/klp18/7d16e6be4d97cd3db1ffa6521c582522

Thanks.

justarandomsysadmin commented 2 years ago

@gabihodoroaga thank's for this module! Did you manage to reproduce the segfaults? It's also happening on my instance (nginx/1.18.0) and shutting down worker proccesses every few minutes (often multiple times per minute).

Let me know if you need further information or configuration files for comparison.

tmagnien commented 2 years ago

@klp18 would you mind sharing a binary core dump file ? I'm trying to reproduce the issue in lab with no luck yet. Backtrace is useful but I'd like to go further into the investigation.

klp18 commented 2 years ago

@tmagnien Here are a few dumps.

tmagnien commented 2 years ago

Thanks @klp18! I will have a look

tmagnien commented 2 years ago

@klp18 Can you provide me with the output of "nginx -V" command, please?

klp18 commented 2 years ago
nginx version: nginx/1.22.0
built by gcc 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
built with OpenSSL 3.0.2 15 Mar 2022
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module --with-mail --with-stream --with-stream_ssl_module --with-mail_ssl_module --add-dynamic-module=../nginx-ntlm-module
tmagnien commented 2 years ago

@klp18 I may have found a fix but can't reproduce the issue locally. Would you mind trying the following patch and see if it fixes ?

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..90ed158 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -380,6 +380,8 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
     if (item->peer_connection != NULL) {

         item->peer_connection->read->timedout = 1;
+       // If connection is already closed, log will be empty and will generate a segfault
+       item->peer_connection->log = ngx_cycle->log;
         ngx_post_event(item->peer_connection->read,&ngx_posted_events);

         ngx_queue_remove(&item->queue);
klp18 commented 2 years ago

@tmagnien the same crash with the same and new back traces new dumps

#0  0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
2394        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=proc@entry=0x55c199a2402a <ngx_worker_process_cycle>, data=data@entry=0x3,
    name=name@entry=0x55c199ad861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5  0x000055c199a23859 in ngx_start_worker_processes (cycle=cycle@entry=0x55c19a273490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6  0x000055c199a24a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:130
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
#0  0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
437         ngx_queue_insert_head(&conf->free, &item->queue);
(gdb) bt
#0  0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x2,
    name=0x55c199ad861a "worker process", respawn=respawn@entry=2) at src/os/unix/ngx_process.c:199
#5  0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6  ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
#0  0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
1282        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x1,
    name=0x55c199ad861a "worker process", respawn=respawn@entry=1) at src/os/unix/ngx_process.c:199
#5  0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6  ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
tmagnien commented 1 year ago

@klp18 Sorry to bother again but I can't reproduce it so would you mind replacing previous patch with this one:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6066980 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -379,8 +379,9 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {

-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+        ngx_http_upstream_ntlm_close(item->peer_connection);
+        /*item->peer_connection->read->timedout = 1;
+        ngx_post_event(item->peer_connection->read,&ngx_posted_events);*/

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);

If segfault still occurs, it would help me to also have nginx debug logs. Thanks a lot!

klp18 commented 1 year ago

@tmagnien new dumps new back trace

#0  0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
69              ngx_log_debug1(NGX_LOG_DEBUG_ALLOC, pool->log, 0, "free: %p", l->alloc);
(gdb) bt
#0  0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
#1  0x00007f2fe95ec767 in ngx_http_upstream_ntlm_close (c=0x55f6eede0720) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:456
#2  0x00007f2fe95ec7b9 in ngx_http_upstream_client_conn_cleanup (data=0x55f6eecc7670) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:382
#3  0x000055f6ecf3f2ac in ngx_destroy_pool (pool=pool@entry=0x55f6eeea3610) at src/core/ngx_palloc.c:57
#4  0x000055f6ecf84942 in ngx_http_close_connection (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:3787
#5  0x000055f6ecf859a2 in ngx_http_close_request (r=r@entry=0x55f6eeef4080, rc=rc@entry=0) at src/http/ngx_http_request.c:3646
#6  0x000055f6ecf85c8a in ngx_http_terminate_handler (r=0x55f6eeef4080) at src/http/ngx_http_request.c:2723
#7  0x000055f6ecf85529 in ngx_http_run_posted_requests (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:2450
#8  0x000055f6ecf85bf2 in ngx_http_request_handler (ev=0x55f6eee0ce90) at src/http/ngx_http_request.c:2418
#9  0x000055f6ecf6b5d4 in ngx_epoll_process_events (cycle=0x55f6eec2d490, timer=<optimized out>, flags=<optimized out>)
    at src/event/modules/ngx_epoll_module.c:901
#10 0x000055f6ecf5f4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x55f6eec2d490) at src/event/ngx_event.c:248
#11 0x000055f6ecf69141 in ngx_worker_process_cycle (cycle=0x55f6eec2d490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#12 0x000055f6ecf6757d in ngx_spawn_process (cycle=cycle@entry=0x55f6eec2d490, proc=proc@entry=0x55f6ecf6902a <ngx_worker_process_cycle>,
    data=data@entry=0x0, name=name@entry=0x55f6ed01d61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#13 0x000055f6ecf68859 in ngx_start_worker_processes (cycle=cycle@entry=0x55f6eec2d490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#14 0x000055f6ecf69a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55f6eec2d490) at src/os/unix/ngx_process_cycle.c:130
#15 0x000055f6ecf3d80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
tmagnien commented 1 year ago

Hi again @klp18. Can you try the following patch, please:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..88ff463 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {

     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {
-
-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+       ngx_http_upstream_ntlm_close(item->peer_connection);

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {

 #endif

+    // If connection is already closed, don't do anything (pool is already gone)
+    if (c->fd == (ngx_socket_t) -1) {
+        return;
+    }
+
     if (c->pool) {
         ngx_destroy_pool(c->pool);
         c->pool = NULL;

And again, sorry for those multiple tries, but I still don't have any way to reproduce the issue, so I'm working blind.

klp18 commented 1 year ago

I have no problem with doing such tests, so ask away. I wish I had enough time right now to try to figure out the problem myself.

new dumps

Here's another thing I noticed. The previous dumps were accompanied with the following entries in the nginx log file:

2022/11/15 14:10:11 [error] 129235#129235: *11975 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.141, server:
2022/11/15 14:11:15 [error] 129237#129237: *11997 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.88, server:

and the new ones with the following:

2022/11/15 18:22:31 [alert] 134292#134292: *439 connection already closed while reading upstream, client: 192.168.4.193, server:
2022/11/15 18:23:07 [alert] 134292#134292: *437 connection already closed while reading upstream, client: 192.168.4.18, server:

new back trace:

[New LWP 134509]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
2435            pr = r->main->posted_requests;
(gdb) bt
#0  0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
#1  0x0000564b6e5f3d7d in ngx_http_upstream_handler (ev=0x564b6eca6650) at src/http/ngx_http_upstream.c:1299
#2  0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#3  0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#4  0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#5  0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=0x564b6e5c402a <ngx_worker_process_cycle>, data=0x0,
    name=0x564b6e67861a "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#6  0x0000564b6e5c4dba in ngx_reap_children (cycle=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:598
#7  ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:174
#8  0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

and another one:

[New LWP 134292]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140549354153792, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fd432585476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fd43256b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fd4325cc6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fd43271eb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007fd4325e3d7c in malloc_printerr (str=str@entry=0x7fd432721710 "free(): double free detected in tcache 2") at ./malloc/malloc.c:5664
#7  0x00007fd4325e612b in _int_free (av=0x7fd43275cc80 <main_arena>, p=0x564b6ecddfc0, have_lock=0) at ./malloc/malloc.c:4473
#8  0x00007fd4325e84d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x0000564b6e59a38d in ngx_destroy_pool (pool=0x564b6edb3190) at src/core/ngx_palloc.c:85
#10 0x0000564b6e5f37fd in ngx_http_upstream_finalize_request (r=r@entry=0x564b6ed6e490, u=u@entry=0x564b6ed93600, rc=rc@entry=499)
    at src/http/ngx_http_upstream.c:4513
#11 0x0000564b6e5f3f65 in ngx_http_upstream_check_broken_connection (r=0x564b6ed6e490, ev=0x564b6eca6890) at src/http/ngx_http_upstream.c:1433
#12 0x0000564b6e5f41c8 in ngx_http_upstream_rd_check_broken_connection (r=<optimized out>) at src/http/ngx_http_upstream.c:1306
#13 0x0000564b6e5e0c59 in ngx_http_request_handler (ev=0x564b6eca6890) at src/http/ngx_http_request.c:2415
#14 0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#15 0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#16 0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#17 0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=proc@entry=0x564b6e5c402a <ngx_worker_process_cycle>, data=data@entry=0x2,
    name=name@entry=0x564b6e67861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x0000564b6e5c3859 in ngx_start_worker_processes (cycle=cycle@entry=0x564b6eac7490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#19 0x0000564b6e5c4a6e in ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:130
#20 0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
tmagnien commented 1 year ago

Thanks a lot @klp18 . Do you think you could provide a full Nginx debug log ? This would help me understanding the sequence of events.

And in the meantime, please try this patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..9de776f 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {

     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {
-
-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+        ngx_add_timer(item->peer_connection->read, 1);

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {

 #endif

+    // If connection is already closed, don't do anything (pool is already gone)
+    if (c->fd == (ngx_socket_t) -1) {
+        return;
+    }
+
     if (c->pool) {
         ngx_destroy_pool(c->pool);
         c->pool = NULL;
klp18 commented 1 year ago

new dumps with debug logs with the new patch applied

tmagnien commented 1 year ago

Thanks a lot @klp18 , it was helpful. Can you try the following patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6b392de 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -365,6 +372,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }
klp18 commented 1 year ago

crashes are much less frequent now. new dumps and debug logs

tmagnien commented 1 year ago

@klp18 I think I spotted the root cause, can you please try this patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..981dcb0 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -327,6 +334,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
         if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
             cleanup_item = cln->data;
+            // reassign cleanup data in case it has moved
+            cln->data = item;
             break;
         }
     }
@@ -365,6 +374,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }
klp18 commented 1 year ago

@tmagnien Not a single crash in three hours. I'll post an update tomorrow.

tmagnien commented 1 year ago

@klp18 great news ! If you confirm tomorrow I'll create a PR with the patch.

klp18 commented 1 year ago

@tmagnien Four crashes happened overnight. One of them (180266) contains ngx_http_upstream_ntlm_close_handler in backtrace. I turned off the debug logs yesterday. I can turn it back on and wait for a new crash.

tmagnien commented 1 year ago

@klp18 Thanks for the core dumps, and yes, if you can enable debug it will help a lot.

klp18 commented 1 year ago

@tmagnien crash dump with debug logs

tmagnien commented 1 year ago

@klp18 updated patch to handle last (I hope) edge case:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..d68c2b3 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -306,6 +313,14 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,

         item = ngx_queue_data(q, ngx_http_upstream_ntlm_cache_t, queue);
         ngx_http_upstream_ntlm_close(item->peer_connection);
+        // remove the client connection drop down handler if present
+        for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
         item->peer_connection = NULL;
     } else {
         q = ngx_queue_head(&hndp->conf->free);
@@ -327,6 +342,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
         if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
             cleanup_item = cln->data;
+            // reassign cleanup data in case it has moved
+            cln->data = item;
             break;
         }
     }
@@ -365,6 +382,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }
klp18 commented 1 year ago

There was one crash in last 16 hours though. Now I will restart nginx with debug logs and try to catch it again.

#0  0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
1282        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
#1  0x0000558ee696ea2e in ngx_event_process_posted (cycle=cycle@entry=0x558ee88f1490, posted=0x558ee6a82d10 <ngx_posted_events>)
    at src/event/ngx_event_posted.c:34
#2  0x0000558ee696e4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x558ee88f1490) at src/event/ngx_event.c:263
#3  0x0000558ee6978141 in ngx_worker_process_cycle (cycle=0x558ee88f1490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x0000558ee697657d in ngx_spawn_process (cycle=cycle@entry=0x558ee88f1490, proc=proc@entry=0x558ee697802a <ngx_worker_process_cycle>,
    data=data@entry=0x0, name=name@entry=0x558ee6a2c61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5  0x0000558ee6977859 in ngx_start_worker_processes (cycle=cycle@entry=0x558ee88f1490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6  0x0000558ee6978a6e in ngx_master_process_cycle (cycle=cycle@entry=0x558ee88f1490) at src/os/unix/ngx_process_cycle.c:130
#7  0x0000558ee694c80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
klp18 commented 1 year ago

@tmagnien dumps with debug logs

tmagnien commented 1 year ago

@klp18 Thanks! Just one question : from the logs I feel there's a internal proxy configuration on your setup, something like client -> nginx -> nginx -> server, is this right ?

klp18 commented 1 year ago

@tmagnien Good question. Internal clients only work through one nginx with ntlm-module. External clients work through another additional "proxy" - TMG 2010 (which I plan to get rid of if I can get nginx to work stable). Thus, the requests look like client -> router (NAT1) -> tmg2010 (NAT2) -> nginx -> server.

Publishing through TMG 2010 has its own specific features. For example, when it proxies an http/https request, it can preserve the client's original ip address at the tcp level (using the built-in NAT, I guess). image

So it turns out that I don't have an extra proxy in front of nginx, but a double NAT.

tmagnien commented 1 year ago

Hi @klp18 . I finally have some time to work in this again. Would you have the possibility to make the same tests with keepalive disabled ? I mean "keepalive_timeout 0;" in nginx config file. It would help me confirm that this is where the issue resides. Thanks a lot !

klp18 commented 1 year ago

Hello @tmagnien, when I set keepalive_timeout to zero, authentication stops working. Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log. Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state. And it's unlikely to crash if clients can't authenticate.

tmagnien commented 1 year ago

Ok no problem, thanks for trying.

Le mar. 7 févr. 2023, 20:37, klp18 @.***> a écrit :

Hello @tmagnien https://github.com/tmagnien, when I set keepalive_timeout to zero, authentication stops working. Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log. Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state. And it's unlikely to crash if clients can't authenticate.

— Reply to this email directly, view it on GitHub https://github.com/gabihodoroaga/nginx-ntlm-module/issues/5#issuecomment-1421340425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJMVWF4D3KVCU57AVTQWKLWWKP7TANCNFSM5BAHGO4A . You are receiving this because you were mentioned.Message ID: @.***>