cep21 / healthcheck_nginx_upstreams

Health checks upstreams for nginx
http://wiki.nginx.org/NginxHttpHealthcheckModule
253 stars 599 forks source link

nginx with healthcheck segfault on reload #5

Open fancy-rabbit opened 13 years ago

fancy-rabbit commented 13 years ago

in /var/log/messages there're lots of kernel: nginx[28164]: segfault at 0000000000000018 rip 0000000000410a8f rsp 00007fffce7ff7d0 error 4 messages. When I reload nginx, the pid 28164 belongs to the shutting down nginx worker_process. Every time I reload nginx there're segfaults, unless I delete all the healthcheck directives from its configuration file.

core dump shows this: Program terminated with signal 11, Segmentation fault.

0 0x0000000000410a8f in time ()

(gdb) bt

0 0x0000000000410a8f in time ()

1 0x0000000000417879 in time ()

2 0x00000000004177a5 in time ()

3 0x000000000041c49e in time ()

4 0x000000000040424b in time ()

5 0x0000003abac1d994 in __libc_start_main () from /lib64/libc.so.6

6 0x0000000000402a59 in time ()

7 0x00007fffce7ffb38 in ?? ()

8 0x0000000000000000 in ?? ()

cep21 commented 13 years ago

Thanks. What version of nginx are you using?

fancy-rabbit commented 13 years ago

0.8.53 built with your modified upstream hash module. but even I'm not using the upstream hash module, reload causes segfaults.

fancy-rabbit commented 13 years ago

nginx version: 0.8.53 built by gcc 4.1.2 20080704 (Red Hat 4.1.2-48) --prefix=/opt/xxx/nginx --with-http_stub_status_module --with-http_realip_module --pid-path=/var/run/nginx.pid --add-module=/usr/src/redhat/SOURCES/ngx_cache_purge-1.2 --add-module=/usr/src/redhat/SOURCES/nginx_upstream_hash_with_healthcheck-0.3.1 --add-module=/usr/src/redhat/SOURCES/nginx_healthcheck_for_upstreams

cep21 commented 13 years ago

I'll look at it when I can. I've heard of this behavior before from other people, but just ignored it since nginx still worked before and after (The old process was segfaulting not the new one). Is that the case for you? If you build with debug symbols, what stack trace do you see in a coredump?

fancy-rabbit commented 13 years ago

0 0x000000000041ac21 in ngx_clean_old_cycles (ev=0x6ceea0) at src/core/ngx_cycle.c:1350

1350 if (cycle[i]->connections[n].fd != (ngx_socket_t) -1) { (gdb) bt

0 0x000000000041ac21 in ngx_clean_old_cycles (ev=0x6ceea0) at src/core/ngx_cycle.c:1350

1 0x000000000042693a in ngx_event_expire_timers () at src/event/ngx_event_timer.c:149

2 0x0000000000424634 in ngx_process_events_and_timers (cycle=0x1449fba0) at src/event/ngx_event.c:277

3 0x000000000042fb8d in ngx_single_process_cycle (cycle=0x1449fba0) at src/os/unix/ngx_process_cycle.c:306

4 0x0000000000403103 in main (argc=1, argv=0x7fffc1af4c88) at src/core/nginx.c:398

douyuan commented 12 years ago

I've encountered a similar problem, but (at least looks like) solved by adding 4 lines of checking code to ngx_http_healthcheck_clear_events. --------code diff--------

void ngx_http_healthcheck_clear_events(ngx_log_t *log) {
    ngx_uint_t i;
    ngx_log_debug0(NGX_LOG_DEBUG_HTTP, log, 0,
            "healthcheck: Clearing events");

    //  Note: From what I can tell it is safe to ngx_del_timer events
    //  that are not in the event tree
    for (i=0; i<ngx_http_healthchecks_arr->nelts; i++) {
+        if (ngx_http_healthchecks[i].conf->healthcheck_enabled) {
+            if (ngx_http_healthchecks[i].health_ev.timer_set)
                ngx_del_timer(&ngx_http_healthchecks[i].health_ev);
+            if (ngx_http_healthchecks[i].ownership_ev.timer_set)
                ngx_del_timer(&ngx_http_healthchecks[i].ownership_ev);
+        }
    }
}

--------error log-------- May 28 17:54:48 tc_69_88 kernel: nginx[18763]: segfault at 8 ip 00000000004117c6 sp 00007fffe61be000 error 4 in nginx[400000+b0000] May 28 17:54:48 tc_69_88 abrt[18772]: saved core dump of pid 18763 (/tmp/nginx/sbin/nginx) to /var/spool/abrt/ccpp-1338198888-18763.new/coredump (1273856 bytes) May 28 17:54:48 tc_69_88 abrtd: Directory 'ccpp-1338198888-18763' creation detected May 28 17:54:48 tc_69_88 abrtd: Executable '/tmp/nginx/sbin/nginx' doesn't belong to any package May 28 17:54:48 tc_69_88 abrtd: Corrupted or bad crash /var/spool/abrt/ccpp-1338198888-18763 (res:4), deleting --------GDB backtrace--------

0 0x0000000000411e26 in ngx_rbtree_min ()

1 0x0000000000412259 in ngx_rbtree_delete ()

2 0x000000000049e376 in ngx_event_del_timer ()

3 0x000000000049f77b in ngx_http_healthcheck_clear_events ()

4 0x000000000049e728 in ngx_http_healthcheck_mark_finished ()

5 0x000000000049ecd8 in ngx_http_healthcheck_read_handler ()

6 0x0000000000434826 in ngx_epoll_process_events ()

7 0x0000000000425b94 in ngx_process_events_and_timers ()

8 0x00000000004327b7 in ngx_worker_process_cycle ()

9 0x000000000042f30e in ngx_spawn_process ()

10 0x0000000000431651 in ngx_start_worker_processes ()

11 0x00000000004311f1 in ngx_master_process_cycle ()

12 0x00000000004034a9 in main ()