AlexseyVolkov / nginx-upstream-jvm-route

Automatically exported from code.google.com/p/nginx-upstream-jvm-route
0 stars 0 forks source link

SegFault with nginx-1.0.10 #5

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What is the expected output? What do you see instead?
crashed

What version of the product are you using? On what operating system?
latest code from trunk

Please provide any additional information below.

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000445228 in ngx_http_upstream_init_request (r=0x1ecd4fa0) at 
src/http/ngx_http_upstream.c:625
625         if (uscf->peer.init(r, uscf) != NGX_OK) {
(gdb) bt
#0  0x0000000000445228 in ngx_http_upstream_init_request (r=0x1ecd4fa0) at 
src/http/ngx_http_upstream.c:625
#1  0x0000000000445387 in ngx_http_upstream_init (r=0x1ecd4fa0) at 
src/http/ngx_http_upstream.c:432
#2  0x000000000043a1d1 in ngx_http_read_client_request_body (r=0x1ecd4fa0, 
post_handler=0x445296 <ngx_http_upstream_init>) at 
src/http/ngx_http_request_body.c:58
#3  0x0000000000460e92 in ngx_http_proxy_handler (r=0x1ecd4fa0) at 
src/http/modules/ngx_http_proxy_module.c:617
#4  0x000000000042e5b1 in ngx_http_core_content_phase (r=0x1ecd4fa0, 
ph=0x1ee49f80) at src/http/ngx_http_core_module.c:1365
#5  0x0000000000429843 in ngx_http_core_run_phases (r=0x1ecd4fa0) at 
src/http/ngx_http_core_module.c:861
#6  0x0000000000429940 in ngx_http_handler (r=0x1ecd4fa0) at 
src/http/ngx_http_core_module.c:844
#7  0x0000000000433efc in ngx_http_process_request (r=0x1ecd4fa0) at 
src/http/ngx_http_request.c:1665
#8  0x00000000004347cc in ngx_http_process_request_headers (rev=0x1ea5ebb8) at 
src/http/ngx_http_request.c:1111
#9  0x0000000000434da2 in ngx_http_process_request_line (rev=0x2b0cebc919c0) at 
src/http/ngx_http_request.c:911
#10 0x0000000000422c2e in ngx_epoll_process_events (cycle=0x1e1200e0, 
timer=<value optimized out>, flags=<value optimized out>)
    at src/event/modules/ngx_epoll_module.c:678
#11 0x000000000041a5f8 in ngx_process_events_and_timers (cycle=0x1e1200e0) at 
src/event/ngx_event.c:245
#12 0x0000000000421463 in ngx_worker_process_cycle (cycle=0x1e1200e0, 
data=<value optimized out>) at src/os/unix/ngx_process_cycle.c:801
#13 0x000000000041fbc7 in ngx_spawn_process (cycle=0x1e1200e0, proc=0x421378 
<ngx_worker_process_cycle>, data=0x0, name=0x53486e "worker process", respawn=0)
    at src/os/unix/ngx_process.c:196
#14 0x0000000000421f99 in ngx_reap_children (cycle=0x1e1200e0) at 
src/os/unix/ngx_process_cycle.c:617
#15 ngx_master_process_cycle (cycle=0x1e1200e0) at 
src/os/unix/ngx_process_cycle.c:180
#16 0x0000000000404fda in main (argc=22, argv=0x1e114000) at 
src/core/nginx.c:405

==================================================================

(gdb) backtrace full
#0  0x0000000000445228 in ngx_http_upstream_init_request (r=0x1ecd4fa0) at 
src/http/ngx_http_upstream.c:625
        host = 0x445296
        i = 504502128
        ctx = <value optimized out>
        temp = {next = 0x0, resolver = 0xffffffff00000010, udp_connection = 0x0, ident = 4294967296, state = 0, type = 264341109216, name = {len = 16,
            data = 0x1000 <Address 0x1000 out of bounds>}, naddrs = 504717696, addrs = 0x2b0cebb13a90, addr = 504502128, handler = 0x3d8bc74e2e <malloc+110>,
          data = 0x7fffd8e5f1e8, timeout = 16, quick = 4096, recursion = 264338100843, event = 0x1ed74b50}
        cln = <value optimized out>
        u = 0x1ed09d00
        clcf = 0x1e2d9068
        uscf = 0x0
        uscfp = 0x3d8bc7aec2
        umcf = <value optimized out>
#1  0x0000000000445387 in ngx_http_upstream_init (r=0x1ecd4fa0) at 
src/http/ngx_http_upstream.c:432
        c = 0x2b0cebb13a90
#2  0x000000000043a1d1 in ngx_http_read_client_request_body (r=0x1ecd4fa0, 
post_handler=0x445296 <ngx_http_upstream_init>) at 
src/http/ngx_http_request_body.c:58
        preread = <value optimized out>
        size = <value optimized out>
        b = <value optimized out>
        cl = <value optimized out>
        next = <value optimized out>
        tf = <value optimized out>
        rb = 0x1e2d9470
        clcf = 0x0
#3  0x0000000000460e92 in ngx_http_proxy_handler (r=0x1ecd4fa0) at 
src/http/modules/ngx_http_proxy_module.c:617
        rc = <value optimized out>
        u = 0x1ed09d00
        ctx = 0x1ea5f1c0
        plcf = 0x1e2d9470
#4  0x000000000042e5b1 in ngx_http_core_content_phase (r=0x1ecd4fa0, 
ph=0x1ee49f80) at src/http/ngx_http_core_module.c:1365
        root = 4380104
        rc = <value optimized out>
        path = {len = 504717696, data = 0x2b0cebb13a90 "\240O\315\036"}
#5  0x0000000000429843 in ngx_http_core_run_phases (r=0x1ecd4fa0) at 
src/http/ngx_http_core_module.c:861
        rc = <value optimized out>
        ph = 0x1ee49e60
#6  0x0000000000429940 in ngx_http_handler (r=0x1ecd4fa0) at 
src/http/ngx_http_core_module.c:844
No locals.
#7  0x0000000000433efc in ngx_http_process_request (r=0x1ecd4fa0) at 
src/http/ngx_http_request.c:1665
        c = 0x2b0cebb13a90
#8  0x00000000004347cc in ngx_http_process_request_headers (rev=0x1ea5ebb8) at 
src/http/ngx_http_request.c:1111
        p = <value optimized out>
        len = <value optimized out>
        n = <value optimized out>
        rc = 1

Original issue reported on code.google.com by michael...@gmail.com on 14 Dec 2011 at 2:26

GoogleCodeExporter commented 9 years ago
Yes, can you show me your config? And this info:
./nginx -V

Do you install jvm_route module from the download tarballs or the svn 
repository?

Thanks

Original comment by nbubi...@gmail.com on 14 Dec 2011 at 3:02

GoogleCodeExporter commented 9 years ago
I'm using the latest code from repository. I also tried downgrading to 0.8.55 
but the problem still exists.

nginx: nginx version: nginx/1.0.10
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-51)
nginx: TLS SNI support enabled
nginx: configure arguments: --prefix=/usr --sbin-path=/usr/sbin/nginx 
--conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log 
--pid-path=/var/run/nginx/nginx.pid --lock-path=/var/lock/nginx.lock 
--user=nginx --group=nginx --with-http_ssl_module --with-http_dav_module 
--with-http_flv_module --with-http_gzip_static_module 
--with-http_stub_status_module --with-http_realip_module 
--add-module=/home/software/nginx_mogilefs_module 
--add-module=/home/software/nginx-upload-progress-module 
--add-module=/home/software/nginx_upstream_jvm_route 
--with-openssl=/home/software/openssl-0.9.8r --with-openssl-opt=enable-tlsext 
--with-debug --http-log-path=/var/log/nginx/access.log 
--http-client-body-temp-path=/cache/nginx/client 
--http-proxy-temp-path=/cache/nginx/proxy 
--http-fastcgi-temp-path=/cache/nginx/fcgi

Original comment by michael...@gmail.com on 14 Dec 2011 at 3:08

GoogleCodeExporter commented 9 years ago
Thanks.

Can you show me you nginx.conf? Does this problem occur frequently or rarely?

I can't reproduce this problem in my box.

Original comment by nbubi...@gmail.com on 14 Dec 2011 at 4:50

GoogleCodeExporter commented 9 years ago
It happens rather frequently, about once every minute.

Here's my nginx.conf snippet:

worker_processes  8;
worker_rlimit_nofile 8192;
events {
    use epoll;
    worker_connections  8192;
}

http {
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;

    server_names_hash_bucket_size 128;

    open_log_file_cache max=100 inactive=20s min_uses=2 valid=1m;

    keepalive_timeout  65 20;

    client_max_body_size       80m;
    client_body_buffer_size    128k;

    upstream tomcat {
      server 10.0.0.1:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=a;
      server 10.0.0.2:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=b;
      server 10.0.0.3:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=c;
      server 10.0.0.1:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=a;
      server 10.0.0.2:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=b;
      jvm_route $cookie_JSESSIONID reverse;
    }

    server {
        listen       80;
        server_name  XXXXX;

        location ~ \.jsp$ {
            proxy_pass   http://tomcat;

            proxy_set_header   Host             $host;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
            proxy_buffer_size          4k;
            proxy_buffers              4 32k;
            proxy_busy_buffers_size    64k;
            proxy_temp_file_write_size 64k;
            proxy_connect_timeout      30s;
            proxy_read_timeout         90s;
            proxy_next_upstream        error timeout;
            proxy_ignore_client_abort  on;
        }
    }
}

Original comment by michael...@gmail.com on 14 Dec 2011 at 7:00

GoogleCodeExporter commented 9 years ago
Thanks. I'll test it. You can use the ip_hash module temporarily in your 
production server.

Original comment by nbubi...@gmail.com on 14 Dec 2011 at 7:28

GoogleCodeExporter commented 9 years ago
I was looking at the code but could not find any reason why uscf would become 
NULL. Quite strange...

Original comment by michael...@gmail.com on 14 Dec 2011 at 8:16

GoogleCodeExporter commented 9 years ago
Yes, That also confuses me too. It seems the point of uscf is  overlapped or 
uninitialized.

I still can't reproduce your bug :-( ...

Original comment by nbubi...@gmail.com on 14 Dec 2011 at 9:05

GoogleCodeExporter commented 9 years ago
Since I have the environment to reproduce fairly easily, if you think it's 
helpful, I'm more than happy to add a few lines of code to help you diagnose 
this problem. Just let me know where to put them. Thanks!

Original comment by michael...@gmail.com on 14 Dec 2011 at 9:19

GoogleCodeExporter commented 9 years ago
Thanks. You are a great hacker.

I will review the code first. If I find some dubious code, I will fix it and 
you can test it. 

Original comment by nbubi...@gmail.com on 14 Dec 2011 at 9:32

GoogleCodeExporter commented 9 years ago
Hi Michael,

Can you try the latest code? And show me the error.log if there is any error.

Hope this can help you.

Thanks.

Original comment by nbubi...@gmail.com on 15 Dec 2011 at 7:01

GoogleCodeExporter commented 9 years ago
Just applied the new code, but the problem still exists with only the core 
dumped msg in error.log:

2011/12/15 15:43:09 [alert] 30806#0: worker process 30832 exited on signal 11 
(core dumped)
2011/12/15 15:43:13 [alert] 30806#0: worker process 30825 exited on signal 11 
(core dumped)
2011/12/15 15:43:20 [alert] 30806#0: worker process 30829 exited on signal 11 
(core dumped)
2011/12/15 15:43:26 [alert] 30806#0: worker process 30833 exited on signal 11 
(core dumped)

The dump has the same error, complaining uscf being a NULL pointer...

Original comment by michael...@gmail.com on 15 Dec 2011 at 7:49

GoogleCodeExporter commented 9 years ago
Is there any error in the log?

Original comment by nbubi...@gmail.com on 15 Dec 2011 at 7:51

GoogleCodeExporter commented 9 years ago
It seems quite frequent. Can you turn on the debug.log for a while? And give me 
the debug.log.

error_log  logs/error.log debug;

Thanks.

Original comment by nbubi...@gmail.com on 15 Dec 2011 at 7:57

GoogleCodeExporter commented 9 years ago
Here's the partial debug log for one dumped process just before it crashed.

2011/12/15 16:14:10 [debug] 31214#0: accept on 0.0.0.0:80, ready: 0
2011/12/15 16:14:10 [debug] 31214#0: posix_memalign: 000000000913F730:256 @16
2011/12/15 16:14:10 [debug] 31214#0: *3721 accept: 189.202.24.49 fd:207
2011/12/15 16:14:10 [debug] 31214#0: *3721 event timer add: 207: 
60000:1323936910071
2011/12/15 16:14:10 [debug] 31214#0: *3721 epoll add event: fd:207 op:1 
ev:80000001
2011/12/15 16:14:10 [debug] 31214#0: post event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: delete posted event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: accept on 0.0.0.0:80, ready: 0
2011/12/15 16:14:10 [debug] 31214#0: posix_memalign: 0000000009021140:256 @16
2011/12/15 16:14:10 [debug] 31214#0: *3727 accept: 189.202.24.49 fd:205
2011/12/15 16:14:10 [debug] 31214#0: *3727 event timer add: 205: 
60000:1323936910076
2011/12/15 16:14:10 [debug] 31214#0: *3727 epoll add event: fd:205 op:1 
ev:80000001
2011/12/15 16:14:10 [debug] 31214#0: post event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: delete posted event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: accept on 0.0.0.0:80, ready: 0
2011/12/15 16:14:10 [debug] 31214#0: posix_memalign: 0000000008FED830:256 @16
2011/12/15 16:14:10 [debug] 31214#0: *3728 accept: 189.202.24.49 fd:209
2011/12/15 16:14:10 [debug] 31214#0: *3728 event timer add: 209: 
60000:1323936910077
2011/12/15 16:14:10 [debug] 31214#0: *3728 epoll add event: fd:209 op:1 
ev:80000001
2011/12/15 16:14:10 [debug] 31214#0: *2580 posix_memalign: 
0000000008E07CE0:4096 @16
2011/12/15 16:14:10 [debug] 31214#0: *2580 http process request line
2011/12/15 16:14:10 [debug] 31214#0: *2580 http request line: "GET XXXXX 
HTTP/1.1"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http uri: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http args: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http exten: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http process request header line
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Host: XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Accept: */*"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Pragma: no-cache"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Range: bytes=0-148478"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "User-Agent: 
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 
2.0.5
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Cache-Control: 
no-cache, max-age=86400"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Connection: keep-alive"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header done
2011/12/15 16:14:10 [debug] 31214#0: epoll del event: fd:26 op:2 ev:00000000
2011/12/15 16:14:10 [debug] 31214#0: epoll del event: fd:27 op:2 ev:00000000
2011/12/15 16:14:10 [debug] 31214#0: *3518 posix_memalign: 
0000000008FE9ED0:4096 @16
2011/12/15 16:14:10 [debug] 31214#0: *3518 http process request line
2011/12/15 16:14:10 [debug] 31214#0: *3518 http request line: "GET XXXXX 
HTTP/1.1"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http uri: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http args: ""
2011/12/15 16:14:10 [debug] 31214#0: *3518 http exten: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http process request header line
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Accept: */*"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Accept-Language: zh-cn"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Accept-Encoding: gzip, 
deflate"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "User-Agent: 
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; 
.NET CLR
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Host: XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Connection: keep-alive"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header done
2011/12/15 16:14:10 [debug] 31214#0: epoll add event: fd:26 op:1 ev:00000001
2011/12/15 16:14:10 [debug] 31214#0: epoll add event: fd:27 op:1 ev:00000001
2011/12/15 16:14:10 [debug] 31214#0: *3721 post event 00002AE37C531790
2011/12/15 16:14:10 [debug] 31214#0: *3721 delete posted event 00002AE37C531790
2011/12/15 16:14:10 [debug] 31214#0: *3721 malloc: 0000000008FEBEF0:1296
2011/12/15 16:14:10 [debug] 31214#0: *3721 posix_memalign: 0000000008EF7960:256 
@16
2011/12/15 16:14:10 [debug] 31214#0: *3721 malloc: 0000000008FEC410:1024
2011/12/15 16:14:10 [debug] 31214#0: *3721 posix_memalign: 
0000000009039AF0:4096 @16
2011/12/15 16:14:10 [debug] 31214#0: *3721 http process request line
2011/12/15 16:14:10 [debug] 31214#0: *3721 recv: fd:207 904 of 1024
2011/12/15 16:14:10 [debug] 31214#0: *3721 http request line: "GET XXXXX 
HTTP/1.1"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http uri: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http args: ""
2011/12/15 16:14:10 [debug] 31214#0: *3721 http exten: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http process request header line
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Host: XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "User-Agent: 
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; zh-cn) AppleWebKit/533.21.1 
(KHTML,
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Accept: */*"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Accept-Language: zh-cn"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Accept-Encoding: gzip, 
deflate"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Cookie: 
__utmz=25815611.1323908752.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); 
__utmc=2581
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Connection: keep-alive"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header done
2011/12/15 16:14:10 [alert] 31212#0: worker process 31214 exited on signal 11 
(core dumped)

Original comment by michael...@gmail.com on 15 Dec 2011 at 8:31

GoogleCodeExporter commented 9 years ago
It seems none of the request finished. That's too bad. If you don't use my 
module, the problem disappears. Is that right?

Original comment by nbubi...@gmail.com on 15 Dec 2011 at 8:49

GoogleCodeExporter commented 9 years ago
I've done more testing and confirmed that it was not crashed by the plugin. 
Sorry about the trouble and thanks again for all the help!

Original comment by michael...@gmail.com on 16 Dec 2011 at 12:08

GoogleCodeExporter commented 9 years ago
OK, I'm luck. I'm interested in what caused this problem? It seems some codes 
write the memory illegally.

Original comment by nbubi...@gmail.com on 19 Dec 2011 at 1:57

GoogleCodeExporter commented 9 years ago
Turns out It has something to do with my configuration. It's related to me 
having a proxy_pass setting within an IF clause. Somehow it did not set the 
correct proxy_pass and made the upstream become NULL.

What's strange though, was the fact that it didn't cause me any trouble until 
recently. Still not sure what triggered this crashing behaviour since I've had 
this configuration for a very long time.

I guess IF statements are truly evil: http://wiki.nginx.org/IfIsEvil

Original comment by michael...@gmail.com on 19 Dec 2011 at 2:19

GoogleCodeExporter commented 9 years ago
Yes, I see. 'If' is evil. You had better to use 'set' or 'rewrite' in the 'if' 
block.

Original comment by nbubi...@gmail.com on 19 Dec 2011 at 2:27