apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.36k stars 365 forks source link

ngx_pagespeed causes nginx to stop responding to requests #438

Closed ddrager closed 5 years ago

ddrager commented 11 years ago

On my production site nginx stops responding to connections after about 30 minutes running the pagespeed module.

 7448 ?        Ss     0:00 nginx: master process /usr/local/sbin/nginx -c /etc/nginx/conf/nginx.conf
 7449 ?        Sl    18:48  \_ nginx: worker process is shutting down             
 7450 ?        Sl    16:51  \_ nginx: worker process is shutting down             
 7451 ?        Sl    16:22  \_ nginx: worker process is shutting down             
 9020 ?        Sl     9:38  \_ nginx: worker process is shutting down             
 9021 ?        Sl     9:32  \_ nginx: worker process is shutting down             
 9022 ?        Sl     9:51  \_ nginx: worker process is shutting down             
 9023 ?        Sl     9:34  \_ nginx: worker process is shutting down             
 9730 ?        Dl    12:36  \_ nginx: worker process                              
 9731 ?        Dl    12:18  \_ nginx: worker process                              
 9732 ?        Dl    12:19  \_ nginx: worker process                              
 9733 ?        Dl    12:13  \_ nginx: worker process                              
 9734 ?        Dl     0:01  \_ nginx: cache manager process

Error log shows:

2013/07/12 09:55:15 [error] 7451#0: [ngx_pagespeed 1.6.29.3-3270] http://domain.com/attachment.php?attachmentid=1368177&stc=1&thumb=1&d=1349112204:0:serf_context_run error status=110 (Connection timed out)

Here is an strace from one of the processes stuck in the shutting-down state:

...
futex(0x2564d44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 395620, {1373640897, 312297000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x2506a68, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2564d44, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2506a68, 395624) = 1
futex(0x2506a68, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2564d44, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2506a68, 395626) = 1
futex(0x2506a68, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2564d44, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2506a68, 395628) = 1
futex(0x2564d44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 395629, {1373640898, 313275000}, ffffffff) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x2564d44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 395630, {1373640898, 313275000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x2506a68, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2564d44, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2506a68, 395634) = 1
futex(0x2506a68, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2564d44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 395636, {1373640899, 313543000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x2506a68, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2564d44, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2506a68, 395640) = 1
futex(0x2564d44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 395641, {1373640900, 313867000}, ffffffff) = -1 EAGAIN (Resource temporarily unavailable)
...

Some relevant settings from nginx.conf:

worker_processes  4;
worker_rlimit_nofile 10240;

events {
  worker_connections   6000;
  use epoll;
}

nginx build info:

[root@sensation ~]# nginx -V
nginx version: nginx/1.4.1
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) 
TLS SNI support enabled
configure arguments: --sbin-path=/usr/local/sbin --with-http_stub_status_module --prefix=/etc/nginx --with-http_gzip_static_module --with-http_ssl_module --without-http_autoindex_module --without-http_ssi_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-openssl=../openssl-1.0.1e --with-pcre=../pcre-8.30 --add-module=../ngx_pagespeed-release-1.6.29.3-beta

Any thoughts or ideas how I can debug the cause of this? Turning off Pagespeed fixes the issue.

oschaaf commented 11 years ago

@ddrager is the line from error.log you posted all there is? If possible, what might help is getting a core dump of a worker that is stuck in shutdown. That way, we can hopefully figure out what is going on in the workers that got stuck, by inspecting the core dump.

http://stackoverflow.com/questions/68160/is-it-possible-to-get-a-core-dump-of-a-running-process-and-its-symbol-table

ddrager commented 11 years ago

Yes, it is there a few times (12 to be exact) but exactly the same time/pid. I'll start it up again and wait for it to go into this state, then try to get a core dump.

vnevremeni commented 11 years ago

Confirming this. I switched to nginx, all is fine, but ngx_pagespeed is TOTALLY UNUSABLE for now. I tried it with latest versions, 1.4.2 (stable), 1.5.3 (mainline) and also tried to compile all from source (including psol). Always same error. Nginx stops responding after second+ ab or siege test. Let's try: ab -n 1000 -c 10 http://artactivator.com/ for first time, all goes fine. 50 req/s for drupal index.php page but second time, there are timeouts, ab stops working. Ok, trying siege. Req/s slowing slowing down each test, until nginx will give 500 error

In nginx's error log I receive such errors randomly: Check failed: may_startthreads NgxBaseFetch::RequestCollection: Broken pipe

Disabling ngx_pagespeed shows about 50 req/s every stresstest on every page and nginx working fine. But it's better to recompile nginx without pagespeed support, because sometimes it's working strange when off.... I want to use it, but don't want to switch back to apache....

ddrager commented 5 years ago

Closing out old issue.