apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.37k stars 365 forks source link

Redirects appear to cause crash on nginx 1.9.12 with pagespeed 1.10.33.5-beta #1140

Closed jdelgad closed 8 years ago

jdelgad commented 8 years ago

This may be related to https://github.com/pagespeed/ngx_pagespeed/issues/1081

Server is running CentOS 7 with the latest patches. We use TLS with a certificate from LetsEncrypt. Running php 5.4 and using Xenforo forum software. We do redirection for XXX.com to www.XXX.com (anonymizing our site, but probably very poorly) for both http and https. http2 is supported.

As you'll see I do my best to optimize for performance to gain a competitive SEO advantage over our competitors. I implemented the fix found for https://trac.nginx.org/nginx/ticket/235 which has reduced our signal 11 errors, but not eradicated them. I believe there may be a problem with ngx_pagespeed given the stack trace posted below.

I can send logs and core dumps to anyone if needed. I've had to disable pagespeed for now. This can cause nginx to hang and our website to become completely unresponsive at times when under load (a lot of crashes can occur in a short time period). User agents range from iphones to Windows 10 so I haven't ruled out a platform/OS.

The common thread among the signal 11 signals I see are related to 200 redirects. This may be due to misconfiguring nginx, although it was working until I upgraded nginx and pagespeed (and added ssl) which is why I've posted all of my configuration below.

nginx errors.log snippet

2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "DOCUMENT_URI"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "/index.php"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "DOCUMENT_URI: /index.php"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "DOCUMENT_ROOT"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "/var/www/YYY.zzz/html"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "DOCUMENT_ROOT: /var/www/YYY.zzz/html"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "SERVER_PROTOCOL"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "HTTP/2.0"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "SERVER_PROTOCOL: HTTP/2.0"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "HTTPS"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "on"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTPS: on"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "GATEWAY_INTERFACE"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "CGI/1.1"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "GATEWAY_INTERFACE: CGI/1.1"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "SERVER_SOFTWARE"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "nginx/"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "1.9.12"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "SERVER_SOFTWARE: nginx/1.9.12"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "REMOTE_ADDR"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "IPADDRESS"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "REMOTE_ADDR: IPADDRESS"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "REMOTE_PORT"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "65377"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "REMOTE_PORT: 65377"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "SERVER_ADDR"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "SERVERIP"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "SERVER_ADDR: SERVERIP"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "SERVER_PORT"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "443"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "SERVER_PORT: 443"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "SERVER_NAME"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script var: "www.YYY.zzz"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "SERVER_NAME: www.YYY.zzz"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "REDIRECT_STATUS"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http script copy: "200"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "REDIRECT_STATUS: 200"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_HOST: www.YYY.zzz"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_ACCEPT: image/webp,image/*,*/*;q=0.8"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_USER_AGENT: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_REFERER: https://www.YYY.zzz/forums/barzil.42/"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_ACCEPT_ENCODING: gzip, deflate, sdch"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_ACCEPT_LANGUAGE: en-US,en;q=0.8"
2016/02/27 22:50:24 [debug] 12324#0: *7626 fastcgi param: "HTTP_COOKIE: COOKIE"
2016/02/27 22:50:24 [debug] 12324#0: *7626 http cleanup add: 000056009AF641A8
2016/02/27 22:50:24 [debug] 12324#0: *7626 get rr peer, try: 1
2016/02/27 22:50:24 [debug] 12324#0: *7626 socket 16
2016/02/27 22:50:24 [debug] 12324#0: *7626 epoll add connection: fd:16 ev:80002005
2016/02/27 22:50:24 [debug] 12324#0: *7626 connect to unix:/var/run/php-fpm/php-fpm.sock, fd:16 #7629
2016/02/27 22:50:24 [debug] 12324#0: *7626 connected
2016/02/27 22:50:24 [debug] 12324#0: *7626 http upstream connect: 0
2016/02/27 22:50:24 [debug] 12324#0: *7626 posix_memalign: 000056009AF5E3C0:128 @16
2016/02/27 22:50:24 [debug] 12324#0: *7626 http upstream send request
2016/02/27 22:50:24 [debug] 12324#0: *7626 http upstream send request body
2016/02/27 22:50:24 [debug] 12324#0: *7626 chain writer buf fl:0 s:1152
2016/02/27 22:50:24 [debug] 12324#0: *7626 chain writer in: 000056009AF641E0
2016/02/27 22:50:24 [debug] 12324#0: *7626 writev: 1152 of 1152
2016/02/27 22:50:24 [debug] 12324#0: *7626 chain writer out: 0000000000000000
2016/02/27 22:50:24 [debug] 12324#0: *7626 event timer add: 16: 60000:1456638684757
2016/02/27 22:50:24 [debug] 12324#0: *7626 http finalize request: -4, "/index.php?/404.html&" a:1, c:4
2016/02/27 22:50:24 [debug] 12324#0: *7626 http request count:4 blk:0
2016/02/27 22:50:24 [debug] 12324#0: *7626 http finalize request: -4, "/index.php?/404.html&" a:1, c:3
2016/02/27 22:50:24 [debug] 12324#0: *7626 http request count:3 blk:0
2016/02/27 22:50:24 [debug] 12324#0: *7626 http finalize request: -4, "/index.php?/404.html&" a:1, c:2
2016/02/27 22:50:24 [debug] 12324#0: *7626 http request count:2 blk:0
2016/02/27 22:50:24 [debug] 12324#0: worker cycle
2016/02/27 22:50:24 [debug] 12324#0: accept mutex locked
2016/02/27 22:50:24 [debug] 12324#0: epoll timer: 4929
2016/02/27 22:50:24 [debug] 12324#0: epoll: fd:20 ev:0001 d:000056009AD2F2A0
2016/02/27 22:50:24 [debug] 12324#0: post event 000056009AD650D0
2016/02/27 22:50:24 [debug] 12324#0: epoll: fd:16 ev:0004 d:000056009AD2F600
2016/02/27 22:50:24 [debug] 12324#0: *7626 post event 000056009AD7D260
2016/02/27 22:50:24 [debug] 12324#0: timer delta: 2
2016/02/27 22:50:24 [debug] 12324#0: posted event 000056009AD650D0
2016/02/27 22:50:24 [debug] 12324#0: delete posted event 000056009AD650D0
2016/02/27 22:50:24 [debug] 12324#0: pagespeed [000056009AED9FB0] event: D. bf:000056009AF11990 (ps resource) - refcnt:1 - det: N
2016/02/27 22:50:24 [notice] 12033#0: signal 17 (SIGCHLD) received
2016/02/27 22:50:24 [alert] 12033#0: worker process 12324 exited on signal 11 (core dumped)

nginx.conf

user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log debug;
pid /run/nginx.pid;
worker_rlimit_core  500M;
working_directory   /etc/nginx-core;

events {
    worker_connections 1024;
    multi_accept on;
    use epoll;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;

    # pagespeed global directives
    pagespeed on;
    pagespeed NumRewriteThreads 4;
    pagespeed NumExpensiveRewriteThreads 4;
    pagespeed FileCachePath /var/ngx_pagespeed_cache;

    # fix workers exiting with exit code 11
    ssl_session_cache    shared:SSL:10m;

    # do not report nginx version
    server_tokens off;

    # Do not allow this site to be displayed in iframes
    add_header X-Frame-Options DENY;

    # Do not permit Content-Type sniffing.
    add_header X-Content-Type-Options nosniff;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    include /etc/nginx/conf.d/*.conf;

    server {
        listen       80 default_server;
        listen       [::]:80 default_server;
        server_name  _;
        root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
        }

        error_page 404 /404.html;
            location = /40x.html {
        }

        error_page 500 502 503 504 /50x.html;
            location = /50x.html {
        }
    }
}

default.conf file under /etc/nginx/conf.d

server {
    listen       80;
    listen       [::]:80;
    server_name  XXX.com;
    return       301 http://www.XXX.com$request_uri;
}

server {
    listen       443 ssl http2;
    listen       [::]:443 ssl http2;
    server_name  XXX.com;
    return       301 https://www.XXX.com$request_uri;
    ssl_certificate /etc/letsencrypt/live/XXX.com-0001/fullchain.pem;
    ssl_trusted_certificate /etc/letsencrypt/live/XXX.com-0001/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/XXX.com-0001/privkey.pem;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:ECDHE-RSA-AES128-GCM-SHA256:AES256+EECDH:DHE-RSA-AES128-GCM-SHA256:AES256+EDH:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4";
    ssl_prefer_server_ciphers on;
    ssl_dhparam /etc/ssl/certs/dhparam.pem;

    # ocsp stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.4.4 8.8.8.8 valid=300s;
    resolver_timeout 10s;

    # Cache SSL sessions for 10m (this is about 40,000 sessions), timing them out
    # after 24 hours.
    ssl_session_timeout 24h;

    # Set the buffer size to 1400 bytes (that way it fits into a single MTU).
    ssl_buffer_size 1400;

    # Enable HSTS
    add_header Strict-Transport-Security max-age=31536000 always;
}

server {
    listen       80;
    listen       [::]:80;

    server_name  www.XXX.com;
    root         /var/www/XXX.com/html;
    index        index.php;

    # vb redirects
    rewrite ^/content.php$ / permanent;
    rewrite ^/forumdisplay.php/([0-9]+)-.*$ /forums/$1/ permanent;
    rewrite ^/forum.php$ /forums/ permanent;
    rewrite ^/member.php/([0-9]+)-.*$ /members/$1/ permanent;
    rewrite ^/showthread.php/([0-9]+)-.*$ /threads/$1/ permanent; 

    # Ensure requests for pagespeed optimized resources go to the pagespeed handler
    # and no extraneous headers get set.
    location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" {
      add_header "" "";
    }
    location ~ "^/pagespeed_static/" { }
    location ~ "^/ngx_pagespeed_beacon$" { }

    # XXX.com rules below
    location / {
        try_files $uri $uri/ /index.php?$uri&$args;
    }
    location /internal_data/ {
        internal;
    }
    location /library/ {
        internal;
    }

    error_page 404 /404.html;
    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
        root /usr/share/nginx/html;
    }

    location ~ ^/admin.php$ {
        rewrite ^ https://www.XXX.com/admin.php;
    }

    location ~ \.php$ {
        try_files $uri =404;
        include fastcgi.conf;
        fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
        fastcgi_index index.php;
        fastcgi_buffers 32 32k;
        fastcgi_buffer_size 32k;
    }

    # user is trying to access non php files--BAD!
    location ~ \.(aspx|jsp|cgi|asp|pl)$ {
        return 410;
    }

    location ~*  \.(jpg|jpeg|png|gif|css|js|ico|swf|flv|xml|pdf)$ {
        expires 30d;
        add_header Pragma public;
        add_header Cache-Control "public, must-revalidate, proxy-revalidate";
        access_log off;
        log_not_found off;
   }
}

server {
    listen       443 ssl;
    listen       [::]:443 ssl;

    server_name  www.XXX.com;
    root         /var/www/XXX.com/html;
    index        index.php;

    # ssl parameters
    ssl_certificate /etc/letsencrypt/live/XXX-0001/fullchain.pem;
    ssl_trusted_certificate /etc/letsencrypt/live/XXX-0001/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/XXX-0001/privkey.pem;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:ECDHE-RSA-AES128-GCM-SHA256:AES256+EECDH:DHE-RSA-AES128-GCM-SHA256:AES256+EDH:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4";
    ssl_prefer_server_ciphers on;
    ssl_dhparam /etc/ssl/certs/dhparam.pem;

    # ocsp stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.4.4 8.8.8.8 valid=300s;
    resolver_timeout 10s;

    # Cache SSL sessions for 10m (this is about 40,000 sessions), timing them out
    # after 24 hours.
    ssl_session_timeout 24h;

    # Set the buffer size to 1400 bytes (that way it fits into a single MTU).
    ssl_buffer_size 1400;

    # Enable HSTS
    add_header Strict-Transport-Security max-age=31536000 always;

    # vb redirects
    rewrite ^/content.php$ / permanent;
    rewrite ^/forumdisplay.php/([0-9]+)-.*$ /forums/$1/ permanent;
    rewrite ^/forum.php$ /forums/ permanent;
    rewrite ^/member.php/([0-9]+)-.*$ /members/$1/ permanent;
    rewrite ^/showthread.php/([0-9]+)-.*$ /threads/$1/ permanent; 

    # Ensure requests for pagespeed optimized resources go to the pagespeed handler
    # and no extraneous headers get set.
    location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" {
      add_header "" "";
    }
    location ~ "^/pagespeed_static/" { }
    location ~ "^/ngx_pagespeed_beacon$" { }

    # XXX.com rules below
    location / {
        try_files $uri $uri/ /index.php?$uri&$args;
    }
    location /internal_data/ {
        internal;
    }
    location /library/ {
        internal;
    }

    error_page 404 /404.html;
    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
        root /usr/share/nginx/html;
    }

    location ~ \.php$ {
        try_files $uri =404;
        include fastcgi.conf;
        fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
        fastcgi_index index.php;
        fastcgi_buffers 32 32k;
        fastcgi_buffer_size 32k;
    }

    # user is trying to access non php files--BAD!
    location ~ \.(aspx|jsp|cgi|asp|pl)$ {
        return 410;
    }

    location ~*  \.(jpg|jpeg|png|gif|css|js|ico|swf|flv|xml|pdf)$ {
        expires 30d;
        add_header Pragma public;
        add_header Cache-Control "public, must-revalidate, proxy-revalidate";
        access_log off;
        log_not_found off;
   }
}

server {
    listen 127.0.0.1:80;
    server_name 127.0.0.1;
    location /nginx_status {
        stub_status on;
        allow 127.0.0.1;
        deny all;
    }
}

backtrace information

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `nginx: worker pr'.
Program terminated with signal 11, Segmentation fault.
#0  net_instaweb::NgxBaseFetch::ReadCallback (data=...)
    at /home/admin/ngx_pagespeed-release-1.10.33.5-beta/src/ngx_base_fetch.cc:143
143   CHECK(data.sender == ctx->base_fetch);
Missing separate debuginfos, use: debuginfo-install GeoIP-1.5.0-9.el7.x86_64 expat-2.1.0-8.el7.x86_64 fontconfig-2.10.95-7.el7.x86_64 freetype-2.4.11-11.el7.x86_64 gd-2.0.35-26.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 gperftools-libs-2.4-7.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libX11-1.6.3-2.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXpm-3.5.11-3.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64 libjpeg-turbo-1.2.90-5.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64 libselinux-2.2.2-6.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 libunwind-1.1-5.el7_2.2.x86_64 libxcb-1.11-4.el7.x86_64 libxml2-2.9.1-6.el7_2.2.x86_64 libxslt-1.1.28-5.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 openssl-libs-1.0.1e-51.el7_2.2.x86_64 pcre-8.32-15.el7.x86_64 perl-libs-5.16.3-286.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  net_instaweb::NgxBaseFetch::ReadCallback (data=...)
    at /home/admin/ngx_pagespeed-release-1.10.33.5-beta/src/ngx_base_fetch.cc:143
#1  0x0000560097d12730 in net_instaweb::NgxEventConnection::ReadAndNotify (fd=20)
    at /home/admin/ngx_pagespeed-release-1.10.33.5-beta/src/ngx_event_connection.cc:122
#2  0x0000560097d127e9 in net_instaweb::NgxEventConnection::ReadEventHandler (ev=0x56009ad650d0)
    at /home/admin/ngx_pagespeed-release-1.10.33.5-beta/src/ngx_event_connection.cc:83
#3  0x0000560097c7d2d6 in ngx_event_process_posted (cycle=cycle@entry=0x56009aab1350, 
    posted=0x560098d59d10 <ngx_posted_events>) at src/event/ngx_event_posted.c:33
#4  0x0000560097c7cd1e in ngx_process_events_and_timers (cycle=cycle@entry=0x56009aab1350)
    at src/event/ngx_event.c:259
#5  0x0000560097c83fc5 in ngx_worker_process_cycle (cycle=cycle@entry=0x56009aab1350, data=data@entry=0x1)
    at src/os/unix/ngx_process_cycle.c:753
#6  0x0000560097c8292a in ngx_spawn_process (cycle=cycle@entry=0x56009aab1350, 
    proc=0x560097c83f70 <ngx_worker_process_cycle>, data=0x1, name=0x560098553d1f "worker process", 
    respawn=respawn@entry=1) at src/os/unix/ngx_process.c:198
#7  0x0000560097c856f7 in ngx_reap_children (cycle=0x56009aab1350) at src/os/unix/ngx_process_cycle.c:621
#8  ngx_master_process_cycle (cycle=cycle@entry=0x56009aab1350) at src/os/unix/ngx_process_cycle.c:174
#9  0x0000560097c5e947 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367
jdelgad commented 8 years ago

nginx -V

nginx version: nginx/1.9.12
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) 
built with OpenSSL 1.0.1e-fips 11 Feb 2013
TLS SNI support enabled
configure arguments:
  --prefix=/usr/share/nginx
  --sbin-path=/usr/sbin/nginx
  --conf-path=/etc/nginx/nginx.conf
  --error-log-path=/var/log/nginx/error.log
  --http-log-path=/var/log/nginx/access.log
  --http-client-body-temp-path=/var/lib/nginx/tmp/client_body
  --http-proxy-temp-path=/var/lib/nginx/tmp/proxy
  --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
  --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi
  --http-scgi-temp-path=/var/lib/nginx/tmp/scgi
  --pid-path=/run/nginx.pid
  --lock-path=/run/lock/subsys/nginx
  --user=nginx --group=nginx
  --with-file-aio
  --with-ipv6
  --with-http_ssl_module
  --with-http_realip_module
  --with-http_addition_module
  --with-http_xslt_module
  --with-http_image_filter_module
  --with-http_geoip_module
  --with-http_sub_module
  --with-http_dav_module
  --with-http_flv_module
  --with-http_mp4_module
  --with-http_gunzip_module
  --with-http_gzip_static_module
  --with-http_random_index_module
  --with-http_secure_link_module
  --with-http_degradation_module
  --with-http_stub_status_module
  --with-http_perl_module
  --with-mail --with-mail_ssl_module
  --with-pcre
  --with-pcre-jit
  --with-google_perftools_module
  --with-debug
  --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic'
  --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'
  --with-http_v2_module
  --add-module=/home/admin/ngx_pagespeed-release-1.10.33.5-beta
  --with-debug
oschaaf commented 8 years ago

@jdelgad This should be fixed on trunk-tracking via these commits:

https://github.com/pagespeed/ngx_pagespeed/commit/b88e067c6d8173112d586fcdfe8d664c1226a00d https://github.com/pagespeed/ngx_pagespeed/commit/60c1f4cc4e58493be12e2c5861a5eb6aafb752c6

You could try cherry-picking them, it looks like they are not included in 1.10.33.5-beta

jdelgad commented 8 years ago

Thanks for the quick reply @oschaaf. I'm trying to get this built. I'll let you know within a few days if everything is working.

Are there plans to get this into 6-beta?

jdelgad commented 8 years ago

I'm getting these errors. It ended up filling up my partitions to 100%. I had to turn it off again.

#0  net_instaweb::NgxBaseFetch::ReadCallback (data=...) at /home/admin/ngx_pagespeed/src/ngx_base_fetch.cc:143
#1  0x0000564350c6c680 in net_instaweb::NgxEventConnection::ReadAndNotify (fd=20)
    at /home/admin/ngx_pagespeed/src/ngx_event_connection.cc:122
#2  0x0000564350c6c739 in net_instaweb::NgxEventConnection::ReadEventHandler (ev=0x564352d5eec0)
    at /home/admin/ngx_pagespeed/src/ngx_event_connection.cc:83
#3  0x0000564350be0951 in ngx_epoll_process_events (cycle=0x564352aab350, timer=<optimized out>, 
    flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:822
#4  0x0000564350bd6c9a in ngx_process_events_and_timers (cycle=cycle@entry=0x564352aab350)
    at src/event/ngx_event.c:242
#5  0x0000564350bddf85 in ngx_worker_process_cycle (cycle=cycle@entry=0x564352aab350, data=data@entry=0x1)
    at src/os/unix/ngx_process_cycle.c:753
#6  0x0000564350bdc8ea in ngx_spawn_process (cycle=cycle@entry=0x564352aab350, 
    proc=0x564350bddf30 <ngx_worker_process_cycle>, data=0x1, name=0x5643514adcff "worker process", 
    respawn=respawn@entry=1) at src/os/unix/ngx_process.c:198
#7  0x0000564350bdf6b7 in ngx_reap_children (cycle=0x564352aab350) at src/os/unix/ngx_process_cycle.c:621
#8  ngx_master_process_cycle (cycle=cycle@entry=0x564352aab350) at src/os/unix/ngx_process_cycle.c:174
#9  0x0000564350bb8907 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367
jdelgad commented 8 years ago

Here is the file after I merged. Just to eliminate a bad merge.

namespace net_instaweb {

const char kHeadersComplete = 'H';
const char kFlush = 'F';
const char kDone = 'D';

NgxEventConnection* NgxBaseFetch::event_connection = NULL;
int NgxBaseFetch::active_base_fetches = 0;

NgxBaseFetch::NgxBaseFetch(ngx_http_request_t* r,
                           NgxServerContext* server_context,
                           const RequestContextPtr& request_ctx,
                           PreserveCachingHeaders preserve_caching_headers,
                           NgxBaseFetchType base_fetch_type)
    : AsyncFetch(request_ctx),
      request_(r),
      server_context_(server_context),
      done_called_(false),
      last_buf_sent_(false),
      references_(2),
      base_fetch_type_(base_fetch_type),
      preserve_caching_headers_(preserve_caching_headers),
      detached_(false),
      suppress_(false) {
  if (pthread_mutex_init(&mutex_, NULL)) CHECK(0);
  __sync_add_and_fetch(&NgxBaseFetch::active_base_fetches, 1);
}

NgxBaseFetch::~NgxBaseFetch() {
  pthread_mutex_destroy(&mutex_);
  __sync_add_and_fetch(&NgxBaseFetch::active_base_fetches, -1);
}

bool NgxBaseFetch::Initialize(ngx_cycle_t* cycle) {
  CHECK(event_connection == NULL) << "event connection already set";
  event_connection = new NgxEventConnection(ReadCallback);
  return event_connection->Init(cycle);
}

void NgxBaseFetch::Terminate() {
  if (event_connection != NULL) {
    GoogleMessageHandler handler;
    PosixTimer timer;
    int64 timeout_us = Timer::kSecondUs * 30;
    int64 end_us = timer.NowUs() + timeout_us;
    static unsigned int sleep_microseconds = 100;

    handler.Message(
        kInfo,"NgxBaseFetch::Terminate rounding up %d active base fetches.",
        NgxBaseFetch::active_base_fetches);

    // Try to continue processing and get the active base fetch count to 0
    // untill the timeout expires.
    // TODO(oschaaf): This needs more work.
    while (NgxBaseFetch::active_base_fetches > 0 && end_us > timer.NowUs()) {
      event_connection->Drain();
      usleep(sleep_microseconds);
    }

    if (NgxBaseFetch::active_base_fetches != 0) {
      handler.Message(
          kWarning,"NgxBaseFetch::Terminate timed out with %d active base fetches.",
          NgxBaseFetch::active_base_fetches);
    }

    // Close down the named pipe.
    event_connection->Shutdown();
    delete event_connection;
    event_connection = NULL;
  }
}

const char* BaseFetchTypeToCStr(NgxBaseFetchType type) {
  switch(type) {
    case kPageSpeedResource:
      return "ps resource";
    case kHtmlTransform:
      return "html transform";
    case kAdminPage:
      return "admin page";
    case kIproLookup:
      return "ipro lookup";
    case kPageSpeedProxy:
      return "pagespeed proxy";
  }
  CHECK(false);
  return "can't get here";
}

void NgxBaseFetch::ReadCallback(const ps_event_data& data) {
  NgxBaseFetch* base_fetch = reinterpret_cast<NgxBaseFetch*>(data.sender);
  ngx_http_request_t* r = base_fetch->request();
  bool detached = base_fetch->detached();
#if (NGX_DEBUG)  // `type` is unused if NGX_DEBUG isn't set, needed for -Werror.
  const char* type = BaseFetchTypeToCStr(base_fetch->base_fetch_type_);
#endif
  int refcount = base_fetch->DecrementRefCount();

#if (NGX_DEBUG)
  ngx_log_error(NGX_LOG_DEBUG, ngx_cycle->log, 0,
     "pagespeed [%p] event: %c. bf:%p (%s) - refcnt:%d - det: %c", r,
     data.type, base_fetch, type, refcount, detached ? 'Y': 'N');
#endif

  // If we ended up destructing the base fetch, or the request context is
  // detached, skip this event.
  if (refcount == 0 || detached) {
    return;
  }
  ps_request_ctx_t* ctx = ps_get_request_context(r);

  CHECK(data.sender == ctx->base_fetch);

  int rc;
  bool run_posted = true;
  // If we are unlucky enough to have our connection finalized mid-ipro-lookup,
  // we must enter a different flow. Also see ps_in_place_check_header_filter().
  if ((ctx->base_fetch->base_fetch_type_ != kIproLookup)
      && r->connection->error) {
    ngx_log_error(NGX_LOG_DEBUG, ngx_cycle->log, 0,
      "pagespeed [%p] request already finalized %d", r, r->count);
    rc = NGX_ERROR;
    run_posted = false;
  } else {
    rc = ps_base_fetch::ps_base_fetch_handler(r);
  }

#if (NGX_DEBUG)
  ngx_log_error(NGX_LOG_DEBUG, ngx_cycle->log, 0,
                "pagespeed [%p] ps_base_fetch_handler() returned %d for %c",
                r, rc, data.type);
#endif

  ngx_connection_t* c = r->connection;
  ngx_http_finalize_request(r, rc);
  if (run_posted) {
    // See http://forum.nginx.org/read.php?2,253006,253061
    ngx_http_run_posted_requests(c);
  }
}

void NgxBaseFetch::Lock() {
  pthread_mutex_lock(&mutex_);
}

void NgxBaseFetch::Unlock() {
  pthread_mutex_unlock(&mutex_);
}

bool NgxBaseFetch::HandleWrite(const StringPiece& sp,
                               MessageHandler* handler) {
  Lock();
  buffer_.append(sp.data(), sp.size());
  Unlock();
  return true;
}

// should only be called in nginx thread
ngx_int_t NgxBaseFetch::CopyBufferToNginx(ngx_chain_t** link_ptr) {
  CHECK(!(done_called_ && last_buf_sent_))
        << "CopyBufferToNginx() was called after the last buffer was sent";

  // there is no buffer to send
  if (!done_called_ && buffer_.empty()) {
    *link_ptr = NULL;
    return NGX_AGAIN;
  }

  int rc = string_piece_to_buffer_chain(
      request_->pool, buffer_, link_ptr, done_called_ /* send_last_buf */);
  if (rc != NGX_OK) {
    return rc;
  }

  // Done with buffer contents now.
  buffer_.clear();

  if (done_called_) {
    last_buf_sent_ = true;
    return NGX_OK;
  }

  return NGX_AGAIN;
}

// There may also be a race condition if this is called between the last Write()
// and Done() such that we're sending an empty buffer with last_buf set, which I
// think nginx will reject.
ngx_int_t NgxBaseFetch::CollectAccumulatedWrites(ngx_chain_t** link_ptr) {
  ngx_int_t rc;
  Lock();
  rc = CopyBufferToNginx(link_ptr);
  Unlock();
  return rc;
}

ngx_int_t NgxBaseFetch::CollectHeaders(ngx_http_headers_out_t* headers_out) {
  const ResponseHeaders* pagespeed_headers = response_headers();

  if (content_length_known()) {
     headers_out->content_length = NULL;
     headers_out->content_length_n = content_length();
  }

  return copy_response_headers_to_ngx(request_, *pagespeed_headers,
                                      preserve_caching_headers_);
}

void NgxBaseFetch::RequestCollection(char type) {
  if (suppress_) {
    return;
  }

  // We must optimistically increment the refcount, and decrement it
  // when we conclude we failed. If we only increment on a successfull write,
  // there's a small chance that between writing and adding to the refcount
  // both pagespeed and nginx will release their refcount -- destructing
  // this NgxBaseFetch instance.
  IncrementRefCount();
  if (!event_connection->WriteEvent(type, this)) {
    DecrementRefCount();
  }
}

void NgxBaseFetch::HandleHeadersComplete() {
  int status_code = response_headers()->status_code();
  bool status_ok = (status_code != 0) && (status_code < 400);

  if ((base_fetch_type_ != kIproLookup) || status_ok) {
    // If this is a 404 response we need to count it in the stats.
    if (response_headers()->status_code() == HttpStatus::kNotFound) {
      server_context_->rewrite_stats()->resource_404_count()->Add(1);
    }
  }

  RequestCollection(kHeadersComplete);  // Headers available.

  // For the IPRO lookup, supress notification of the nginx side here.
  // If we send both the headerscomplete event and the one from done, nasty
  // stuff will happen if we loose the race with with the nginx side destructing
  // this base fetch instance.
  if (base_fetch_type_ == kIproLookup && !status_ok) {
    suppress_ = true;
  }
}

bool NgxBaseFetch::HandleFlush(MessageHandler* handler) {
  RequestCollection(kFlush);  // A new part of the response body is available
  return true;
}

int NgxBaseFetch::DecrementRefCount() {
  return DecrefAndDeleteIfUnreferenced();
}

int NgxBaseFetch::IncrementRefCount() {
  return __sync_add_and_fetch(&references_, 1);
}

int NgxBaseFetch::DecrefAndDeleteIfUnreferenced() {
  // Creates a full memory barrier.
  int r = __sync_add_and_fetch(&references_, -1);
  if (r == 0) {
    delete this;
  }
  return r;
}

void NgxBaseFetch::HandleDone(bool success) {
  // TODO(jefftk): it's possible that instead of locking here we can just modify
  // CopyBufferToNginx to only read done_called_ once.
  CHECK(!done_called_) << "Done already called!";
  Lock();
  done_called_ = true;
  Unlock();
  RequestCollection(kDone);
  DecrefAndDeleteIfUnreferenced();
}

}  // namespace net_instaweb
oschaaf commented 8 years ago

@jdelgad Oops, I ran a comparison, I see I missed a commit that you would have needed to cherry-pick. I'm sorry!

To make this as easy as possible, I suggest you should just overwrite your version with the trunk-tracking version of ngx_base_fetch.cc which includes all required changes:

https://raw.githubusercontent.com/pagespeed/ngx_pagespeed/trunk-tracking/src/ngx_base_fetch.cc

liluxdev commented 8 years ago

I confirm the issue is back:

worker process 22015 exited on signal 11 (core dumped)

Will you do another PR to be merged?

liluxdev commented 8 years ago

Wait, it seems this time the fix doesn't solve

oschaaf commented 8 years ago

@creativeprogramming Which fix did you apply?

liluxdev commented 8 years ago

No sorry I was wrong, i didn't really applied it before ... now yes, and it seems it solves.. (I applied the https://raw.githubusercontent.com/pagespeed/ngx_pagespeed/trunk-tracking/src/ngx_base_fetch.cc)

liluxdev commented 8 years ago

Ok, running it in production (with 250+ pageviews/minute) for 2 hours with the patch applied and no crash or error. Will it be merged?

liluxdev commented 8 years ago

That fix is seriously needed for me, as I get the site down in few minutes without it.

nginx version: nginx/1.9.4
built by gcc 4.9.2 (Ubuntu 4.9.2-10ubuntu13)
built with OpenSSL 1.0.2e 3 Dec 2015
TLS SNI support enabled
configure arguments: --with-threads --with-google_perftools_module --with-pcre --with-pcre-jit --with-ld-opt='-Wl,-z,relro -Wl,-E' --with-cc-opt='-D FD_SETSIZE=40048 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DTCP_FASTOPEN=23' --with-http_stub_status_module --with-http_spdy_module --with-http_ssl_module --prefix=/usr/local/nginx-pagespeed --sbin-path=/usr/local/nginx-pagespeed/nginx --conf-path=/etc/nginx/nginx.conf --pid-path=/var/run/nginx.pid --add-module=/root/nginx-source/ngx_pagespeed-release-1.10.33.5-beta --with-http_gzip_static_module --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --with-http_sub_module
nikolay commented 8 years ago

@oschaaf Yeah, why don't you guys push this into a hotfix release? It's big enough to warrant it.

jeffkaufman commented 8 years ago

@oschaaf Which commit is missing from 1.10.33.5 that gives us these crashes?

oschaaf commented 8 years ago

@jeffkaufman https://github.com/pagespeed/ngx_pagespeed/commit/1964ef5219cc6585ced45662e38dce09dde518df But https://github.com/pagespeed/ngx_pagespeed/commit/b88e067c6d8173112d586fcdfe8d664c1226a00d and https://github.com/pagespeed/ngx_pagespeed/commit/60c1f4cc4e58493be12e2c5861a5eb6aafb752c6 would also be good to have.

I'm a bit confused, I thought they where included, and also https://developers.google.com/speed/pagespeed/module/release_notes mention these commits?

jeffkaufman commented 8 years ago

@oschaaf Yes, those commits are supposed to be in 1.10.33.5, but they're not there. Still trying to figure out why.

jeffkaufman commented 8 years ago

Figured it out. I had a draft of branch release-1.10.33.5-beta on github that pointed to https://github.com/pagespeed/ngx_pagespeed/commits/ed5b23dc563493163a8bdbc27bb9e7de166b5fc2 and then I was out of the office for two weeks. I handed off the release to @crowell who needed to cherry-pick @oschaaf 's fbde0ace7e468adea95a7912a24da8fff14e6cb4 to add it to the release, and he thought I had left the draft release tagged as master. So he cherry-picked the commit onto master and then force-pushed it to release-1.10.33.5-beta (thinking that the warning was just because he was cherry-picking).

I'm preparing a 1.10.33.6 release now that properly includes these commits to fix this bug.

Sorry everyone!

jeffkaufman commented 8 years ago

1.0.33.6 is out now