apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
697 stars 159 forks source link

A whole lot of HTTPD processes #10

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Install
2. Turn on
3. restart apache.

What is the expected output? 

Before enabling pagespeed the server normally has about 6 httpd processes, 

What do you see instead?
Once it's enabled there is about 30 all using between 1-2.5% cpu time.

What version of the product are you using (please check X-Mod-Pagespeed
header)?
X-Mod-Pagespeed=0.9.0.0-128

On what operating system?
Fedora 13

Which version of Apache?
Server version: Apache/2.2.16 (Unix)
Server built:   Jul 27 2010 15:58:08

Which MPM?

Please provide any additional information below, especially a URL or an
HTML file that exhibits the problem.

Original issue reported on code.google.com by j.m.milm...@gmail.com on 4 Nov 2010 at 4:35

GoogleCodeExporter commented 9 years ago
Here's a screenshot of the processes.

Original comment by j.m.milm...@gmail.com on 4 Nov 2010 at 4:37

Attachments:

GoogleCodeExporter commented 9 years ago
I experienced the same issue when testing this module today.

Module version:
0.9.0.0-128

Os:
Ubuntu 10.4

Apache2:
Server version: Apache/2.2.14 (Ubuntu)
Server built:   Aug 19 2010 03:20:29

Server MPM:
Prefork

Additional info:
The pages i tried to open are build by wordpress (with caching modules).

Original comment by googlemi...@japje.nl on 4 Nov 2010 at 9:53

GoogleCodeExporter commented 9 years ago
Thanks for the comments, mod_pagespeed is expected to use more processing power 
because we are running transformations (including expensive image 
transformations). We will be looking into reducing this.

Original comment by sligocki@google.com on 4 Nov 2010 at 11:34

GoogleCodeExporter commented 9 years ago
Hi Shawn. The issue here is not processing power, but actual number of spawned 
processes. I can confirm the above reports - I had over 100 Apache processes 
after installing mod_pagespeed.

A normal shutdown of Apache did not remove these processes and I had to 
forcefully kill them. Is it intentional that these processes should be spawned?

It's a production site and I can't play around too much, but if you have any 
questions about my configuration etc. then I am happy to provide them. In the 
meantime I have disabled the module.

Thanks!
Matthew

Original comment by MWild1 on 4 Nov 2010 at 2:34

GoogleCodeExporter commented 9 years ago
I can confirm this as well.  

Original comment by themee...@gmail.com on 4 Nov 2010 at 5:05

GoogleCodeExporter commented 9 years ago
Hi Also confirmed - had to disable this until resolved

Original comment by lwhittaker on 4 Nov 2010 at 8:36

GoogleCodeExporter commented 9 years ago
On a non-production server, I was able to reproduce this pretty quickly by 
setting the cleaning interval to a second and refreshing frequently to simulate 
high traffic. The only requests to the server were for the rewrite_images test.

X-Mod-Pagespeed=0.9.0.0-128

Debian Lenny x64 (v 5.0.6)

Server Version: Apache/2.2.16 (Unix) PHP/5.2.12 mod_ssl/2.2.16 OpenSSL/0.9.8g 
mod_fastcgi/2.4.6 mod_fcgid/2.3.4
Server Built: Oct 8 2010 16:39:09 

MPM: prefork

Original comment by Wesley.S...@gmail.com on 5 Nov 2010 at 1:06

GoogleCodeExporter commented 9 years ago
i confirm this issue as well and had to disable the mod.  i will be checking up 
on this issue until it is resolved.  if there is anything i can do, let me know.

Original comment by kmr...@gmail.com on 5 Nov 2010 at 1:53

GoogleCodeExporter commented 9 years ago
We're looking into this. We expect some httpd processes to linger for a short 
time while they run asynchronous resource requests and rewrite images. We think 
this may be exacerbation by turning on for the first time under load.

Original comment by sligocki@google.com on 5 Nov 2010 at 3:12

GoogleCodeExporter commented 9 years ago
I believe this is due to a deadlock in serf_url_async_fetcher.  Each time 
someone loads an image-rich page, there's a chance of deadlocking the process.  
When a certain percentage of httpd processes are deadlocked and unable to 
process requests, apache spins up more.

Original comment by abl...@google.com on 5 Nov 2010 at 3:45

GoogleCodeExporter commented 9 years ago
I can also confirm this on CentOS 5.5, Apache 2.6.18-194.8.1.el5 Prefork. 

Running PHP 5.2.10 with xcache 1.3.0

Modules (mod_pagespeed disabled): core prefork http_core mod_so mod_auth_basic 
mod_auth_digest mod_authn_file mod_authn_default mod_authz_host mod_authz_user 
mod_authz_default mod_include mod_log_config mod_env mod_mime_magic mod_expires 
mod_deflate mod_headers mod_setenvif mod_mime mod_status mod_vhost_alias 
mod_negotiation mod_dir mod_alias mod_rewrite mod_proxy mod_proxy_balancer 
mod_cache mod_mem_cache mod_instaweb mod_php5 mod_proxy_ajp 

Number of sleeping apache processes climbs until server limit reached.  Raising 
that causes all 8GB memory to be used, swap space use and very high load.

Original comment by hqar...@gmail.com on 5 Nov 2010 at 6:59

GoogleCodeExporter commented 9 years ago
In regards to comment #10, for general reference: This will eventually become a 
503 error on each new page request to that Apache server, once the number of 
deadlocked processes hits MaxClients.

Original comment by Wesley.S...@gmail.com on 5 Nov 2010 at 7:47

GoogleCodeExporter commented 9 years ago
I installed on the 3rd, and (unwillingly since I firmly believe this is a great 
project) had to uninstall on the 5th, the server never crashed but there were 
high loads and erratic behaviour including serving blank pages. I've put 
together some graphics, probably useless, sorry about that but they may 
illustrate what happened.   

Original comment by webmas...@misabueso.com on 5 Nov 2010 at 10:00

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks everyone for the reports, we have been able to reproduce this and have a 
patch that solves the processor explosion for us. Stay tuned for an update 
coming ASAP.

Original comment by sligocki@google.com on 5 Nov 2010 at 10:21

GoogleCodeExporter commented 9 years ago
We have a fix brewing for this problem.  We'll get it out to you as soon as 
possible.

Original comment by jmara...@google.com on 5 Nov 2010 at 10:21

GoogleCodeExporter commented 9 years ago
http://code.google.com/p/modpagespeed/source/detail?r=146

Note: this fix has not been made available in binary form yet.  We will release 
new binary distributions incorporating these changes on Monday Nov 8.

Those building using the open-source instructions on 
http://code.google.com/p/modpagespeed/wiki/HowToBuild can try out these 
improvements immediately.

Original comment by jmara...@google.com on 5 Nov 2010 at 10:50

GoogleCodeExporter commented 9 years ago
I just tested the new(er) version 0.9.1.1-r171 and i still have the same issue.

Apache spawns many processes and load climbs to 25+ within a few seconds.

Original comment by googlemi...@japje.nl on 9 Nov 2010 at 9:41

GoogleCodeExporter commented 9 years ago
I still have the bug with r173 on debian x64
Apache spawns many processes few seconds after...

Original comment by clement....@gmail.com on 9 Nov 2010 at 3:33

GoogleCodeExporter commented 9 years ago
Darn, sorry you're still having issues.  Thanks for the reports.

We need some more specific information to figure this out.
1. Can you post the url of your site?
2. Can you attach your apache configuration?
3. Exactly how many apache processes are being run before and after the 
installation of mod_pagespeed?
4. What are those processes doing?  A screenshot of "top" like in #1 is 
helpful.  Even more helpful would be to run `sudo strace -p $PID` for a bunch 
of different PIDs (the PID is the number in the first column on `top` or the 
second column of `ps -efwww | egrep 'httpd|apache2'`).  The strace command will 
attach to the process and tell you what it's doing.  Is it sitting there in an 
accept(), read(), or futex()?  Is it busilly running through a bunch of stuff?  
Just look at it for a second or two and then hit ctrl-C to break out.  Grab a 
random sampling of 5 or so and let us know what's going on.
Thanks again; with your help we'll get to the bottom of this.

Original comment by abl...@google.com on 9 Nov 2010 at 4:09

GoogleCodeExporter commented 9 years ago
Hello,

Im still getting the same issue as well. CoS 5.5 32bit.

Date: Tue, 09 Nov 2010 17:37:54 GMT
Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8e-fips-rhel5 
mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.9
X-Powered-By: PHP/5.2.9
X-Mod-Pagespeed: 0.9.1.1-173
Vary: Accept-Encoding
Content-Length: 12175
Content-Type: text/html

Since the comment was too long I attached everything on a file.

Any ideas?

Original comment by nickz...@gmail.com on 9 Nov 2010 at 6:12

Attachments:

GoogleCodeExporter commented 9 years ago
Could you also post your apache error log?

Original comment by jmara...@google.com on 9 Nov 2010 at 7:10

GoogleCodeExporter commented 9 years ago
Sure, here it is. I attached the error_log when re-enabling mod_pagespeed and 
restarting apache, killed it after the load goes up to 10.

Thanks.

Original comment by nickz...@gmail.com on 10 Nov 2010 at 1:09

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for all those details, nickzoid.  But the error_log.txt you attached 
doesn't seem to have any mod_pagespeed logs.  Was this taken with mod_pagespeed 
enabled?  If not, could you enable it briefly and send a new snippet?  If so, 
can you try increasing your loglevel?

Also, it's interesting that mysql is taking up 36% of your CPU.  Is that normal 
for your site when not running mod_pagespeed?  Do you have images, javascript, 
or css stored in mysql?

Original comment by abl...@google.com on 10 Nov 2010 at 4:38

GoogleCodeExporter commented 9 years ago
Hi,

I don't think this is fixed, worked fine for two hours had lots of apache 
process's going had hit max clients.  

[Wed Nov 10 23:10:47 2010] [error] log_message_handler: 
[1110/231047:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)
[Wed Nov 10 23:10:47 2010] [notice] child pid 22156 exit signal Segmentation 
fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23447 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23746 exit signal Segmentation 
fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23750 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23787 exit signal Segmentation 
fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23864 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23944 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23981 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24016 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24098 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24185 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24345 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24653 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23482 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23510 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23961 exit signal Segmentation 
fault (11)
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
[Wed Nov 10 23:10:58 2010] [error] log_message_handler: 
[1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)
[Wed Nov 10 23:10:58 2010] [error] log_message_handler: 
[1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)
[Wed Nov 10 23:10:58 2010] [error] log_message_handler: 
[1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)
pure virtual method called
terminate called without an active exception
[Wed Nov 10 23:11:21 2010] [error] log_message_handler: 
[1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)
[Wed Nov 10 23:11:21 2010] [error] log_message_handler: 
[1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception

Load got to 200+

2.6.18-164.15.1.el5xen #1 SMP Wed Mar 17 12:04:23 EDT 2010 x86_64 x86_64 x86_64 
GNU/Linux
Server version: Apache/2.2.3

Original comment by mark.har...@gmail.com on 10 Nov 2010 at 11:18

GoogleCodeExporter commented 9 years ago
Mark, we agree we still have some issues that need to be resolved which have 
similar symptoms.

We do know that since our Nov 8 update, where we fixed a deadlock scenario that 
we were able to reproduce, the number of reports of this behavior went down.  
But we are not done.

Could you tell us more about your setup?  Where does your content come from?  
Static files?  PHP?  WordPress?  Is MySQL involved?  Of course those 
technologies are pretty stable, but evidence suggests that in some cases our 
interaction with them is not well tuned.

Are you using the pre-fork MPM?   Or are you trying out mod_pagespeed in 
threaded or event modes?

Is your site image-rich?

Could you publish the URL to your site?

Could you attach your pagespeed.conf file?

Thanks for your help!

Original comment by jmara...@google.com on 10 Nov 2010 at 11:33

GoogleCodeExporter commented 9 years ago
This is a rather uninformative error message that we were printing:

[Wed Nov 10 23:11:21 2010] [error] log_message_handler: 
[1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll 
success status 0 (110)

In our next release we'll at least print out the text for error 110, which is 
"Connection timed out".

I think what's happening is that a transient spike in load, say, from image 
compression, is causing the system to become unresponsive, which causes Apache 
to spawn more processes to handle incoming requests.  This makes the situation 
worse.

We have been load testing but haven't seen this on our systems, even with an 
image-rich page.  But aside from image compression, in our system, we have 
nothing really compute intensive going on because we load-test using static 
mirrors of web sites.

There is something on your system, Mark, that we believe is helping Apache & 
mod_instaweb go unstable.  Independent of that there are steps that we can and 
should take in mod_pagespeed to shed load and avoid harming system performance. 
 But it would be helpful to replicate your serving infrastructure to repro your 
problem and convince ourselves it's fixed.

Original comment by jmara...@google.com on 11 Nov 2010 at 2:10

GoogleCodeExporter commented 9 years ago
Hi,

It's www.profileheaven.com   Medium traffic, Setup is
XEN:
core app: Nginx -> apache (php+some css + php + some js)
images/media: Nginx -> nginx -> images/css/js  

Obviously I can't re-write the images/media setup without also putting apache 
on that server but if fear IO issues via cache/images on the SAN.

The site itself is rich in dynamic content (still on fixing this but it's a big 
job).  However the load was steady for a good couple of hours with only a 
slight increase then shortly after the apache process's went wild and the load 
skyrocketed as it went upto the maxclients/threads.

Original comment by mark.har...@gmail.com on 11 Nov 2010 at 8:42

GoogleCodeExporter commented 9 years ago
Sorry to confirm it's not threads it's worker process setup.

Original comment by mark.har...@gmail.com on 11 Nov 2010 at 8:43

GoogleCodeExporter commented 9 years ago
Our MySQL setup is Master/Slave with read/write split however I think the layer 
this is targeting in apache is after the transaction has completed.  I'd have 
to provide the usage graphs (cacti) on a non-public system as it's commercially 
sensitive.

Original comment by mark.har...@gmail.com on 11 Nov 2010 at 9:41

GoogleCodeExporter commented 9 years ago
@mark.harburn, worker MPM is multi-threaded: 
http://httpd.apache.org/docs/2.0/mod/worker.html

Original comment by sligocki@google.com on 11 Nov 2010 at 2:26

GoogleCodeExporter commented 9 years ago
OK -- we do not support the 'worker' MPM or the 'thread' MPM.

We would like to do so -- just haven't tested it yet.

We did find and fix a problem recently that could easily explain this error, 
and it's in the 'trunk' if you want to build from source.  But it's not in a 
binary distribution yet.  But even if you do this, we haven't load-tested 
'worker' or 'threads' yet so you may see other issues.

Please stay tuned; support of more modern MPMs is bubbling to the top of our 
priority queue.

Original comment by jmara...@google.com on 11 Nov 2010 at 2:53

GoogleCodeExporter commented 9 years ago
Hi,

It's prefork, was posting these in a rush appologies.  Bad day in the office 
and all that...

Original comment by mark.har...@gmail.com on 11 Nov 2010 at 2:56

GoogleCodeExporter commented 9 years ago
Heres the config, turned everything on to make sure we got full perfomance.

Original comment by mark.har...@gmail.com on 11 Nov 2010 at 4:01

Attachments:

GoogleCodeExporter commented 9 years ago
Hey guys,

I have changed the LogLevel to Warn, attached is the error_log after enabling 
pagespeed again on my webserver. By the way, Im not sure if its because of the 
low traffic Im getting today but the server didnt get overloaded immediately, 
it took about 10min-12min to start getting high load.

Original comment by nickz...@gmail.com on 12 Nov 2010 at 3:30

Attachments:

GoogleCodeExporter commented 9 years ago
Ok,

Something a bit more useful from the error logs just before we get all the 
segfaults get a lot of:

[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)

I'd disabled all css enhancements other than the ones in the core.

Original comment by mark.har...@gmail.com on 12 Nov 2010 at 9:01

GoogleCodeExporter commented 9 years ago
Ok, I was able to reproduce a process-blowup on my machine, and I have a fix 
for it.  It will be hitting the SVN repository shortly, with binaries to come 
soon after.  So hopefully this is fixed for real.

nickzoid, your error_log shows some very troubling messages that suggest memory 
corruption.  I just opened Issue 79 to track that.

For everyone else, I encourage you to update from SVN if possible, or try the 
new binaries when they hit, and let me know if your process blowup disappears.

Original comment by abl...@google.com on 13 Nov 2010 at 12:43

GoogleCodeExporter commented 9 years ago
I built from source and after around one hour, I assume it had been commited 
when i've built this?

top - 16:44:30 up 24 days,  1:55,  2 users,  load average: 978.57, 1140.73, 
645.74

Built:   mod-pagespeed-beta-0.9.1.1-184.x86_64.rpm

pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
[Sat Nov 13 16:36:24 2010] [notice] child pid 1745 exit signal Aborted (6)
[Sat Nov 13 16:36:26 2010] [notice] child pid 2026 exit signal Aborted (6)

Original comment by mark.har...@gmail.com on 13 Nov 2010 at 4:48

GoogleCodeExporter commented 9 years ago
@mark.harburn, I created a new issue 81 for the "ReadUTFChar not supported 
(non-icu build)" error message. Please provide any more details and let me know 
if you're still getting that message.

Original comment by sligocki@google.com on 13 Nov 2010 at 6:18

GoogleCodeExporter commented 9 years ago
Hi,

I've always have the problem on r220.

LoadAverage is growing up very fast. It come only when I load prestashop 
website.

Did you need any informations to resolve this bug ? 

Thanks.

Original comment by clement....@gmail.com on 17 Nov 2010 at 5:02

GoogleCodeExporter commented 9 years ago
Hello,

I just tested the .1-228 version and Im still getting the same overload issue.

Original comment by nickz...@gmail.com on 18 Nov 2010 at 5:03

GoogleCodeExporter commented 9 years ago
mod-pagespeed-beta 0.9.8.1-r215
Debian Lenny x64 5.0.6
Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny9

confirm the same problem.

Original comment by cleverc...@gmail.com on 18 Nov 2010 at 8:59

GoogleCodeExporter commented 9 years ago
Hi nickzoid, clement, and clevercold.

We were able to reproduce a server-load problem caused by Issue 85.  This issue 
has now been resolved in the SVN tree and the new branch:

gclient config http://modpagespeed.googlecode.com/svn/tags/0.9.10.1/src (r238 
or later on the trunk).

If you can build from source, please let us know if this fixes the symptom for 
you.

Original comment by jmara...@google.com on 20 Nov 2010 at 2:35