Closed GoogleCodeExporter closed 9 years ago
Here's a screenshot of the processes.
Original comment by j.m.milm...@gmail.com
on 4 Nov 2010 at 4:37
Attachments:
I experienced the same issue when testing this module today.
Module version:
0.9.0.0-128
Os:
Ubuntu 10.4
Apache2:
Server version: Apache/2.2.14 (Ubuntu)
Server built: Aug 19 2010 03:20:29
Server MPM:
Prefork
Additional info:
The pages i tried to open are build by wordpress (with caching modules).
Original comment by googlemi...@japje.nl
on 4 Nov 2010 at 9:53
Thanks for the comments, mod_pagespeed is expected to use more processing power
because we are running transformations (including expensive image
transformations). We will be looking into reducing this.
Original comment by sligocki@google.com
on 4 Nov 2010 at 11:34
Hi Shawn. The issue here is not processing power, but actual number of spawned
processes. I can confirm the above reports - I had over 100 Apache processes
after installing mod_pagespeed.
A normal shutdown of Apache did not remove these processes and I had to
forcefully kill them. Is it intentional that these processes should be spawned?
It's a production site and I can't play around too much, but if you have any
questions about my configuration etc. then I am happy to provide them. In the
meantime I have disabled the module.
Thanks!
Matthew
Original comment by MWild1
on 4 Nov 2010 at 2:34
I can confirm this as well.
Original comment by themee...@gmail.com
on 4 Nov 2010 at 5:05
Hi Also confirmed - had to disable this until resolved
Original comment by lwhittaker
on 4 Nov 2010 at 8:36
On a non-production server, I was able to reproduce this pretty quickly by
setting the cleaning interval to a second and refreshing frequently to simulate
high traffic. The only requests to the server were for the rewrite_images test.
X-Mod-Pagespeed=0.9.0.0-128
Debian Lenny x64 (v 5.0.6)
Server Version: Apache/2.2.16 (Unix) PHP/5.2.12 mod_ssl/2.2.16 OpenSSL/0.9.8g
mod_fastcgi/2.4.6 mod_fcgid/2.3.4
Server Built: Oct 8 2010 16:39:09
MPM: prefork
Original comment by Wesley.S...@gmail.com
on 5 Nov 2010 at 1:06
i confirm this issue as well and had to disable the mod. i will be checking up
on this issue until it is resolved. if there is anything i can do, let me know.
Original comment by kmr...@gmail.com
on 5 Nov 2010 at 1:53
We're looking into this. We expect some httpd processes to linger for a short
time while they run asynchronous resource requests and rewrite images. We think
this may be exacerbation by turning on for the first time under load.
Original comment by sligocki@google.com
on 5 Nov 2010 at 3:12
I believe this is due to a deadlock in serf_url_async_fetcher. Each time
someone loads an image-rich page, there's a chance of deadlocking the process.
When a certain percentage of httpd processes are deadlocked and unable to
process requests, apache spins up more.
Original comment by abl...@google.com
on 5 Nov 2010 at 3:45
I can also confirm this on CentOS 5.5, Apache 2.6.18-194.8.1.el5 Prefork.
Running PHP 5.2.10 with xcache 1.3.0
Modules (mod_pagespeed disabled): core prefork http_core mod_so mod_auth_basic
mod_auth_digest mod_authn_file mod_authn_default mod_authz_host mod_authz_user
mod_authz_default mod_include mod_log_config mod_env mod_mime_magic mod_expires
mod_deflate mod_headers mod_setenvif mod_mime mod_status mod_vhost_alias
mod_negotiation mod_dir mod_alias mod_rewrite mod_proxy mod_proxy_balancer
mod_cache mod_mem_cache mod_instaweb mod_php5 mod_proxy_ajp
Number of sleeping apache processes climbs until server limit reached. Raising
that causes all 8GB memory to be used, swap space use and very high load.
Original comment by hqar...@gmail.com
on 5 Nov 2010 at 6:59
In regards to comment #10, for general reference: This will eventually become a
503 error on each new page request to that Apache server, once the number of
deadlocked processes hits MaxClients.
Original comment by Wesley.S...@gmail.com
on 5 Nov 2010 at 7:47
I installed on the 3rd, and (unwillingly since I firmly believe this is a great
project) had to uninstall on the 5th, the server never crashed but there were
high loads and erratic behaviour including serving blank pages. I've put
together some graphics, probably useless, sorry about that but they may
illustrate what happened.
Original comment by webmas...@misabueso.com
on 5 Nov 2010 at 10:00
Attachments:
Thanks everyone for the reports, we have been able to reproduce this and have a
patch that solves the processor explosion for us. Stay tuned for an update
coming ASAP.
Original comment by sligocki@google.com
on 5 Nov 2010 at 10:21
We have a fix brewing for this problem. We'll get it out to you as soon as
possible.
Original comment by jmara...@google.com
on 5 Nov 2010 at 10:21
http://code.google.com/p/modpagespeed/source/detail?r=146
Note: this fix has not been made available in binary form yet. We will release
new binary distributions incorporating these changes on Monday Nov 8.
Those building using the open-source instructions on
http://code.google.com/p/modpagespeed/wiki/HowToBuild can try out these
improvements immediately.
Original comment by jmara...@google.com
on 5 Nov 2010 at 10:50
I just tested the new(er) version 0.9.1.1-r171 and i still have the same issue.
Apache spawns many processes and load climbs to 25+ within a few seconds.
Original comment by googlemi...@japje.nl
on 9 Nov 2010 at 9:41
I still have the bug with r173 on debian x64
Apache spawns many processes few seconds after...
Original comment by clement....@gmail.com
on 9 Nov 2010 at 3:33
Darn, sorry you're still having issues. Thanks for the reports.
We need some more specific information to figure this out.
1. Can you post the url of your site?
2. Can you attach your apache configuration?
3. Exactly how many apache processes are being run before and after the
installation of mod_pagespeed?
4. What are those processes doing? A screenshot of "top" like in #1 is
helpful. Even more helpful would be to run `sudo strace -p $PID` for a bunch
of different PIDs (the PID is the number in the first column on `top` or the
second column of `ps -efwww | egrep 'httpd|apache2'`). The strace command will
attach to the process and tell you what it's doing. Is it sitting there in an
accept(), read(), or futex()? Is it busilly running through a bunch of stuff?
Just look at it for a second or two and then hit ctrl-C to break out. Grab a
random sampling of 5 or so and let us know what's going on.
Thanks again; with your help we'll get to the bottom of this.
Original comment by abl...@google.com
on 9 Nov 2010 at 4:09
Hello,
Im still getting the same issue as well. CoS 5.5 32bit.
Date: Tue, 09 Nov 2010 17:37:54 GMT
Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8e-fips-rhel5
mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.9
X-Powered-By: PHP/5.2.9
X-Mod-Pagespeed: 0.9.1.1-173
Vary: Accept-Encoding
Content-Length: 12175
Content-Type: text/html
Since the comment was too long I attached everything on a file.
Any ideas?
Original comment by nickz...@gmail.com
on 9 Nov 2010 at 6:12
Attachments:
Could you also post your apache error log?
Original comment by jmara...@google.com
on 9 Nov 2010 at 7:10
Sure, here it is. I attached the error_log when re-enabling mod_pagespeed and
restarting apache, killed it after the load goes up to 10.
Thanks.
Original comment by nickz...@gmail.com
on 10 Nov 2010 at 1:09
Attachments:
Thanks for all those details, nickzoid. But the error_log.txt you attached
doesn't seem to have any mod_pagespeed logs. Was this taken with mod_pagespeed
enabled? If not, could you enable it briefly and send a new snippet? If so,
can you try increasing your loglevel?
Also, it's interesting that mysql is taking up 36% of your CPU. Is that normal
for your site when not running mod_pagespeed? Do you have images, javascript,
or css stored in mysql?
Original comment by abl...@google.com
on 10 Nov 2010 at 4:38
Hi,
I don't think this is fixed, worked fine for two hours had lots of apache
process's going had hit max clients.
[Wed Nov 10 23:10:47 2010] [error] log_message_handler:
[1110/231047:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
[Wed Nov 10 23:10:47 2010] [notice] child pid 22156 exit signal Segmentation
fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23447 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23746 exit signal Segmentation
fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23750 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23787 exit signal Segmentation
fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23864 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23944 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23981 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24016 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24098 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24185 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24345 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24653 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23482 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23510 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23961 exit signal Segmentation
fault (11)
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
[Wed Nov 10 23:10:58 2010] [error] log_message_handler:
[1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
[Wed Nov 10 23:10:58 2010] [error] log_message_handler:
[1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
[Wed Nov 10 23:10:58 2010] [error] log_message_handler:
[1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
pure virtual method called
terminate called without an active exception
[Wed Nov 10 23:11:21 2010] [error] log_message_handler:
[1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
[Wed Nov 10 23:11:21 2010] [error] log_message_handler:
[1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
Load got to 200+
2.6.18-164.15.1.el5xen #1 SMP Wed Mar 17 12:04:23 EDT 2010 x86_64 x86_64 x86_64
GNU/Linux
Server version: Apache/2.2.3
Original comment by mark.har...@gmail.com
on 10 Nov 2010 at 11:18
Mark, we agree we still have some issues that need to be resolved which have
similar symptoms.
We do know that since our Nov 8 update, where we fixed a deadlock scenario that
we were able to reproduce, the number of reports of this behavior went down.
But we are not done.
Could you tell us more about your setup? Where does your content come from?
Static files? PHP? WordPress? Is MySQL involved? Of course those
technologies are pretty stable, but evidence suggests that in some cases our
interaction with them is not well tuned.
Are you using the pre-fork MPM? Or are you trying out mod_pagespeed in
threaded or event modes?
Is your site image-rich?
Could you publish the URL to your site?
Could you attach your pagespeed.conf file?
Thanks for your help!
Original comment by jmara...@google.com
on 10 Nov 2010 at 11:33
This is a rather uninformative error message that we were printing:
[Wed Nov 10 23:11:21 2010] [error] log_message_handler:
[1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll
success status 0 (110)
In our next release we'll at least print out the text for error 110, which is
"Connection timed out".
I think what's happening is that a transient spike in load, say, from image
compression, is causing the system to become unresponsive, which causes Apache
to spawn more processes to handle incoming requests. This makes the situation
worse.
We have been load testing but haven't seen this on our systems, even with an
image-rich page. But aside from image compression, in our system, we have
nothing really compute intensive going on because we load-test using static
mirrors of web sites.
There is something on your system, Mark, that we believe is helping Apache &
mod_instaweb go unstable. Independent of that there are steps that we can and
should take in mod_pagespeed to shed load and avoid harming system performance.
But it would be helpful to replicate your serving infrastructure to repro your
problem and convince ourselves it's fixed.
Original comment by jmara...@google.com
on 11 Nov 2010 at 2:10
Hi,
It's www.profileheaven.com Medium traffic, Setup is
XEN:
core app: Nginx -> apache (php+some css + php + some js)
images/media: Nginx -> nginx -> images/css/js
Obviously I can't re-write the images/media setup without also putting apache
on that server but if fear IO issues via cache/images on the SAN.
The site itself is rich in dynamic content (still on fixing this but it's a big
job). However the load was steady for a good couple of hours with only a
slight increase then shortly after the apache process's went wild and the load
skyrocketed as it went upto the maxclients/threads.
Original comment by mark.har...@gmail.com
on 11 Nov 2010 at 8:42
Sorry to confirm it's not threads it's worker process setup.
Original comment by mark.har...@gmail.com
on 11 Nov 2010 at 8:43
Our MySQL setup is Master/Slave with read/write split however I think the layer
this is targeting in apache is after the transaction has completed. I'd have
to provide the usage graphs (cacti) on a non-public system as it's commercially
sensitive.
Original comment by mark.har...@gmail.com
on 11 Nov 2010 at 9:41
@mark.harburn, worker MPM is multi-threaded:
http://httpd.apache.org/docs/2.0/mod/worker.html
Original comment by sligocki@google.com
on 11 Nov 2010 at 2:26
OK -- we do not support the 'worker' MPM or the 'thread' MPM.
We would like to do so -- just haven't tested it yet.
We did find and fix a problem recently that could easily explain this error,
and it's in the 'trunk' if you want to build from source. But it's not in a
binary distribution yet. But even if you do this, we haven't load-tested
'worker' or 'threads' yet so you may see other issues.
Please stay tuned; support of more modern MPMs is bubbling to the top of our
priority queue.
Original comment by jmara...@google.com
on 11 Nov 2010 at 2:53
Hi,
It's prefork, was posting these in a rush appologies. Bad day in the office
and all that...
Original comment by mark.har...@gmail.com
on 11 Nov 2010 at 2:56
Heres the config, turned everything on to make sure we got full perfomance.
Original comment by mark.har...@gmail.com
on 11 Nov 2010 at 4:01
Attachments:
Hey guys,
I have changed the LogLevel to Warn, attached is the error_log after enabling
pagespeed again on my webserver. By the way, Im not sure if its because of the
low traffic Im getting today but the server didnt get overloaded immediately,
it took about 10min-12min to start getting high load.
Original comment by nickz...@gmail.com
on 12 Nov 2010 at 3:30
Attachments:
Ok,
Something a bit more useful from the error logs just before we get all the
segfaults get a lot of:
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler:
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not
supported (non-icu build)
I'd disabled all css enhancements other than the ones in the core.
Original comment by mark.har...@gmail.com
on 12 Nov 2010 at 9:01
Ok, I was able to reproduce a process-blowup on my machine, and I have a fix
for it. It will be hitting the SVN repository shortly, with binaries to come
soon after. So hopefully this is fixed for real.
nickzoid, your error_log shows some very troubling messages that suggest memory
corruption. I just opened Issue 79 to track that.
For everyone else, I encourage you to update from SVN if possible, or try the
new binaries when they hit, and let me know if your process blowup disappears.
Original comment by abl...@google.com
on 13 Nov 2010 at 12:43
I built from source and after around one hour, I assume it had been commited
when i've built this?
top - 16:44:30 up 24 days, 1:55, 2 users, load average: 978.57, 1140.73,
645.74
Built: mod-pagespeed-beta-0.9.1.1-184.x86_64.rpm
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
[Sat Nov 13 16:36:24 2010] [notice] child pid 1745 exit signal Aborted (6)
[Sat Nov 13 16:36:26 2010] [notice] child pid 2026 exit signal Aborted (6)
Original comment by mark.har...@gmail.com
on 13 Nov 2010 at 4:48
@mark.harburn, I created a new issue 81 for the "ReadUTFChar not supported
(non-icu build)" error message. Please provide any more details and let me know
if you're still getting that message.
Original comment by sligocki@google.com
on 13 Nov 2010 at 6:18
Hi,
I've always have the problem on r220.
LoadAverage is growing up very fast. It come only when I load prestashop
website.
Did you need any informations to resolve this bug ?
Thanks.
Original comment by clement....@gmail.com
on 17 Nov 2010 at 5:02
Hello,
I just tested the .1-228 version and Im still getting the same overload issue.
Original comment by nickz...@gmail.com
on 18 Nov 2010 at 5:03
mod-pagespeed-beta 0.9.8.1-r215
Debian Lenny x64 5.0.6
Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny9
confirm the same problem.
Original comment by cleverc...@gmail.com
on 18 Nov 2010 at 8:59
Hi nickzoid, clement, and clevercold.
We were able to reproduce a server-load problem caused by Issue 85. This issue
has now been resolved in the SVN tree and the new branch:
gclient config http://modpagespeed.googlecode.com/svn/tags/0.9.10.1/src (r238
or later on the trunk).
If you can build from source, please let us know if this fixes the symptom for
you.
Original comment by jmara...@google.com
on 20 Nov 2010 at 2:35
Original issue reported on code.google.com by
j.m.milm...@gmail.com
on 4 Nov 2010 at 4:35