nginx reload with pagespeed sometimes not working (child processes staying alive)

StevDa86 commented 9 years ago

hi we're using nginx on RedHat Linux 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

nginx1.6.2 with pagespeed 1.9.32.3beta: nginx version: nginx/1.6.2 built by gcc 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) TLS SNI support enabled configure arguments: --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --with-http_ssl_module --with-http_stub_status_module --with-http_geoip_module --http-client-body-temp-path=/var/cache/nginx/client_body_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_tempmake --add-module=/tmp/nginx-1.6.2/ngx_pagespeed-1.9.32.3-beta

our problem is, since we have added the new pagespeed version, our reload wont work properly. we're using the following command to reload our nginx.

"nginx -s reload"

sometimes it happens that the master process is killed, but the worker process of the old master process are still running. So we are not able to start a new master process while the old workers are still there. We have no idea why this happens.

the message log file shows the following error: Apr 15 11:20:51 kernel: nginx[11423]: segfault at 7 ip 000000000047bbe3 sp 00007fffdd7740f0 error 4 in nginx[400000+957000] Apr 15 11:20:51 init: nginx main process (11423) killed by SEGV signal Apr 15 11:20:51 init: nginx main process ended, respawning Apr 15 11:20:55 init: nginx main process (6642) terminated with status 1 Apr 15 11:20:55 init: nginx main process ended, respawning

i have also make/installed pagespeed and nginx again, but no solution for that.

does anybody has that problem? or has an idea what i can do?

HansVanEijsden commented 9 years ago

Hi @cobain86, I have the same problem.

Linux vps 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt4-3~bpo70+1 (2015-02-12) x86_64 GNU/Linux

Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.133 Build 20141023

nginx version: nginx/1.7.12 TLS SNI support enabled configure arguments: --prefix=/opt/nginx17 --user=www-data --group=www-data --with-http_ssl_module --with-http_spdy_module --with-openssl=/usr/local/src/openssl-1.0.2a --with-openssl-opt='enable-ec_nistp_64_gcc_128 threads' --with-md5=/usr/local/src/openssl-1.0.2a --with-md5-asm --with-sha1=/usr/local/src/openssl-1.0.2a --with-sha1-asm --with-pcre-jit --with-file-aio --with-http_flv_module --with-http_geoip_module --with-http_gzip_static_module --with-http_gunzip_module --with-http_mp4_module --with-http_realip_module --with-http_stub_status_module --with-threads --with-ipv6 --add-module=/usr/local/src/nginx-rtmp-module --add-module=/usr/local/src/ngx_cache_purge-2.3 --add-module=/usr/local/src/ngx_http_substitutions_filter_module --add-module=/home/hans/ngx_pagespeed --with-ld-opt='-ljemalloc -qopenmp -parallel' --with-cc-opt='-DTCP_FASTOPEN=23 -xHOST -O3 -ipo -no-prec-div -qopenmp -pthread -unroll-aggressive -qopt-prefetch -parallel'

What I do is quite simple: /etc/init.d/nginx stop && killall -9 nginx

I know this isn't a solution of course, but for me it works all these months/years. Anyone who can shine a light on this?

oschaaf commented 9 years ago

It would be nice to have backtraces for the segmentation faults. Trunk tracking has changes pending for review [1][2] that should improve behaviour for gracefully stopping a process, which is what configuration reloading needs under the hood. Having backtraces (or testing out with trunk-tracking and these changes patched in) would help confirming that the problems seen here are fixed.

[1] https://github.com/pagespeed/ngx_pagespeed/pull/936 [2] https://gist.github.com/oschaaf/c3bcc61ea74498581dc3

StevDa86 commented 9 years ago

hi do i have to install nginx with the debug option to create a core dump?

i have added the following lines to my nginx config worker_rlimit_core 500M; working_directory /usr/local/nginx/;

after that i tried to use this command but it doesnt work gdb /etc/nginx/ /usr/local/nginx/

oschaaf commented 9 years ago

@cobain86 building and installing nginx / ngx_pagespeed via ./configure --with-debug would be best. If you are able to reproduce a segmentation fault, then hopefully there will be a file called nginx.core in the configured working_directory. Subsequently, if the nginx binary is located at /usr/local/sbin/nginx, you should be able to do this:

gdb /usr/local/sbin/nginx /usr/local/nginx/nginx.core
backtrace full

demidov-a commented 9 years ago

We have the same problem (segfault when reloading nginx). Reproduced on versions 1.7.10 and 1.7.12.

Can we help to resolve this problem?

demidov-a commented 9 years ago

Debug log:

2015/04/22 22:16:40 [notice] 19797#0: signal 1 (SIGHUP) received, reconfiguring 2015/04/22 22:16:40 [debug] 19797#0: wake up, sigio 0 2015/04/22 22:16:40 [notice] 19797#0: reconfiguring 2015/04/22 22:16:40 [debug] 19797#0: posix_memalign: 00000000029DD0C0:16384 @16 2015/04/22 22:16:40 [debug] 19797#0: posix_memalign: 0000000002AB2C50:16384 @16 2015/04/22 22:16:40 [debug] 19797#0: malloc: 00000000029EC790:4096 2015/04/22 22:16:40 [debug] 19797#0: read: 31, 00000000029EC790, 3439, 0 2015/04/22 22:16:40 [debug] 19797#0: add cleanup: 00000000029DFE98 2015/04/22 22:16:40 [debug] 19797#0: add cleanup: 00000000029DFED0 2015/04/22 22:16:40 [debug] 19797#0: add cleanup: 00000000029DFEF8 2015/04/22 22:16:40 [debug] 19797#0: malloc: 00000000029F6500:4280 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AD7D50:4280 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AE4870:4280 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AE5930:4280 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AE69F0:4280 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AF3F50:4280 2015/04/22 22:16:40 [debug] 19797#0: posix_memalign: 0000000002B1A4C0:16384 @16 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AF5010:4096 2015/04/22 22:16:40 [debug] 19797#0: include conf.d/pagespeed.conf 2015/04/22 22:16:40 [debug] 19797#0: include /mnt/data/nginx/conf/conf.d/pagespeed.conf 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AF6020:4096 2015/04/22 22:16:40 [debug] 19797#0: read: 32, 0000000002AF6020, 586, 0 2015/04/22 22:16:40 [info] 19797#0: [ngx_pagespeed 1.9.32.3-4448] No threading detected. Own threads: 1 Rewrite, 1 Expensive Rewrite. 2015/04/22 22:16:40 [debug] 19797#0: posix_memalign: 0000000002AC5B10:16384 @16 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002AF6020:4096 2015/04/22 22:16:40 [debug] 19797#0: include /mnt/data/nginx/conf/mime.types 2015/04/22 22:16:40 [debug] 19797#0: include /mnt/data/nginx/conf/mime.types 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002B0B800:4096 2015/04/22 22:16:40 [debug] 19797#0: read: 32, 0000000002B0B800, 3957, 0 2015/04/22 22:16:40 [debug] 19797#0: malloc: 0000000002B0C810:4096 2015/04/22 22:16:40 [debug] 19797#0: posix_memalign: 0000000002BE5DA0:16384 @16 2015/04/22 22:16:40 [debug] 19797#0: pagespeed: rollback gzip, explicit configuration in /mnt/data/nginx/conf/nginx.conf:72

/var/log/messages at this moment:

Apr 22 22:16:40 localhost kernel: nginx[19797]: segfault at 0 ip 0000000000427667 sp 00007fffc11c51f8 error 4 in nginx[400000+d7b000]

smelchior commented 7 years ago

Has anyone found a solution for this yet? I am also experiencing the same issue and it is quite annoying as it breaks the log file rotation.

jeffkaufman commented 7 years ago

What version are you running? I think we had some changes in 1.10 that might help, and we have some more we're going to be releasing in 1.12, but I'm not completely sure this is fixed.

smelchior commented 7 years ago

This is happening with nginx version 1.10.2 with pagespeed 1.9.32 from dotdeb for wheezy. So i guess updating to jessie with a newer version 1.11.33 might fix things for me :)

jeffkaufman commented 7 years ago

If you could try that, that would be great!

apache / incubator-pagespeed-ngx

nginx reload with pagespeed sometimes not working (child processes staying alive) #954