apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
696 stars 158 forks source link

Apache stuck indefinitely waiting for PSOL #1048

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 9 years ago
I just installed mod_pagespeed on my centos 7 and got tons of httpd errors log 
in 1 minute. An example line of error:
<code>
[Tue Feb 10 11:05:14.311755 2015] [pagespeed:warn] [pid 21132:tid 
139634310850304] [mod_pagespeed 1.9.32.3-4448 @21132] Waiting for completion of 
URL http://exampledomain.com/example-slug/ for 45.001 sec
</code>

ALL requests got error, include image requests too.

My server hardware specs:
* Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz, 8 cores
* 32 GB DDR3 RAM
* 2 x 2 TB SATA 6 Gb/s 7200 rpm HDD (Software-RAID 1) Class Enterprise

Software specs:
Operating system: CentOS Linux 7.0.1406
Kernel: Linux 3.10.0-123.20.1.el7.x86_64 on x86_64

Server Version: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 
PHP/5.6.5 mod_perl/2.0.9dev Perl/v5.16.3
Server MPM: event

What version of the product are you using (please check X-Mod-Pagespeed
header)?
mod-pagespeed-stable-1.9.32.3-4448.x86_64

URL of broken page:
I removed module after 1 minute of terror. If a google developer want to learn 
more of server information, mail me an ip address that i can give permission to 
look mod_info print.

Original issue reported on code.google.com by unsalkor...@gmail.com on 10 Feb 2015 at 9:35

jmaessen commented 9 years ago

Does this imply that we should be handling those filesystem operations off the main thread? That'd be annoying for the common case of cache hit, but doable.

jeffkaufman commented 9 years ago

@jmaessen Or maybe stdio_file_system should just switch to something that times out if loads take too long?

But I'm not sure it's worth it. This bug here is basically two issues:

a) Occasional messages about "waiting for completion of url", with only minor performance degradation b) Incessant messages about "waiting for completion of url", with major performance degradation

Reads taking more than 5s but still completing will trigger (a). In this case the message is effectively a reminder that you have high tail latencies for file reads and that this is hurting PageSpeed's performance. The fix here seems like we should make sure operators running PageSpeed ensure it has fast access to anything it thinks is a filesystem.

The second case, (b), is probably a race condition in the linux futex code that's been patched for a while now and shouldn't be affecting people, though it would also be caused if a branch of our code fails to call a callback.

jmaessen commented 9 years ago

Well, to time out an fread I think you have to step outside stdio and use raw file descriptors. Which might or might not make sense for places where we're doing full-file reads. Arguably we should just have sendfile support in those situations, though, even if it solves a slightly different but related problem.

jmaessen commented 9 years ago

[So basically I agree, sorry, meant to say that.]

jeffkaufman commented 9 years ago

@jmaessen "arguably we should just have sendfile support in those situations"

Unless we're reading the file in so we can optimize it.

jmarantz commented 9 years ago

I am going to add some diags to StdioFileSystem to track and log slow operations.

crowell commented 9 years ago

we have a second "testing" build with a fix for a race condition in ApacheFetch as well as logging for slow file io operations and some small tweaks for BoringSSL compatibility.

(see https://github.com/pagespeed/mod_pagespeed/wiki/Release-1.9.32.7 for changes)

If anyone who has experienced the issue in this bug can give it a try, please let us know if

1: this solves the problem

2: you notice any log messages about slow file operations.

We hope to have a proper release out including these fixes soon after hearing back from testers!

https://github.com/pagespeed/mod_pagespeed/releases/tag/1.9.32.7

deb/rpm are available here, as well as the .tar.bz2 for building from source

crowell commented 9 years ago

reopening after getting some logs from people experiencing this issue.

jeffkaufman commented 9 years ago

@crowell has succeeded at reproducing this here with a testing site set up with the configuration and resources of one of the people reporting the issue. Debugging should be faster now.

crowell commented 9 years ago

symbolized backtrace from my machine https://gist.github.com/anonymous/417c35a0f7cdfe835f58

crowell commented 9 years ago

some interesting (possibly) logs with AprMemCache errors.

https://gist.github.com/crowell/a6904d089c01c4f55d9d

crowell commented 9 years ago

second round of backtraces https://gist.github.com/7ea4af43e4f1484c4100

some more https://gist.github.com/ef816cbfaf39447e767d

WpSEOit commented 9 years ago

Thank you @crowell and @jeffkaufman for your perseverance.

Now I'm writing from the company account, but I've already wrote you as @capn3m0 and I'm sorry that I've couldn't done the tests that you have requested: those servers are still in production so, once that the problem has been solved I didn't feel safe to try the test another time threatening to make it unstable.

I am confident that the problem will be solved as soon as possible, meanwhile I hope you have a good job and I would like to say thank you from the WpSEO Staff and our Clients.

jeffkaufman commented 9 years ago

I just went through the four recent backtraces we have and classifies the states of the threads:

1 32  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 31  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 30  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 29  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
1 28  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
1 27  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
1 26  ap_process_request > ap_core_output_filter > poll
1 25  ap_queue_pop > pthread_cond_wait
1 24  ap_process_request > ap_core_output_filter > poll
1 23  ap_queue_pop > pthread_cond_wait
1 22  ap_queue_pop > pthread_cond_wait
1 21  ap_process_request > ap_core_output_filter > poll
1 20  ap_process_request > HandleAsPagespeedResource
                         > ap_core_output_filter > poll
1 19  ap_process_request > HandleAsPagespeedResource > BoundedWaitFor
                         > pthread_cond_timedwait
1 18  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 17  ap_queue_pop > pthread_cond_wait
1 16  ap_queue_pop > pthread_cond_wait
1 15  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
1 14  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 13  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 12  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
1 11  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
1 10  ap_queue_pop > pthread_cond_wait
1 09  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
1 08  ap_process_request > HandleAsPagespeedResource > BoundedWaitFor
                         > pthread_cond_timedwait
1 07  ap_unixd_accept > accept4
1 06  WorkThread::Run > QueuedWorkerPool::Run > ApacheWriter::Write > deflate
1 05  WorkThread::Run > pthread_cond_wait
1 04  WorkThread::Run > pthread_cond_wait
1 03  WorkThread::Run > pthread_cond_wait
1 02  SerfThreadFn > TransferFetchesAndCheckDone > pthread_cond_timedwait
1 01  main > ap_run_mpm > ap_mpm_podx_check > read

2 32  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 31  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 30  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 29  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 28  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 27  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 26  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 25  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 24  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 23  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 22  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 21  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 20  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 19  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 18  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 17  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 16  ap_process_request > ap_core_output_filter > poll
2 15  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 14  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 13  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 12  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 11  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 10  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 09  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
2 08  ap_process_request > RewriteDriver::FinishParse > pthread_cond_timedwait
2 07  ap_queue_info_wait_for_idler > pthread_cond_wait
2 06  WorkThread::Run > QueuedWorkerPool::Run > ApacheWriter::Write > poll
2 05  WorkThread::Run > pthread_cond_wait
2 04  WorkThread::Run > pthread_cond_wait
2 03  WorkThread::Run > pthread_cond_wait
2 02  SerfThreadFn > TransferFetchesAndCheckDone > pthread_cond_timedwait
2 01  main > ap_run_mpm > ap_mpm_podx_check > read

3 26  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 25  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 24  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 23  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 22  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 21  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 20  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 19  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 18  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 17  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 16  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 15  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 14  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 13  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 12  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
3 11  WorkThread::Run > pthread_cond_wait
3 10  WorkThread::Run > AprMemCache::MultiGet > epoll_wait
3 09  WorkThread::Run > pthread_cond_wait
3 08  WorkThread::Run > pthread_cond_wait
3 07  WorkThread::Run > pthread_cond_wait
3 06  WorkThread::Run > pthread_cond_wait
3 05  SerfThreadFn > TransferFetchesAndCheckDone > pthread_cond_timedwait
3 04  WorkThread::Run > pthread_cond_wait
3 03  WorkThread::Run > pthread_cond_wait
3 02  WorkThread::Run > pthread_cond_wait
3 01  main > ap_run_mpm > apr_thread_join > pthread_join

4 35  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 34  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 33  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 32  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 31  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 30  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 29  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 28  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 27  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 26  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 25  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 24  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 23  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 22  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 21  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 20  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 19  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 18  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 17  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 16  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 15  ap_queue_pop > pthread_cond_wait
4 14  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 13  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 12  ap_process_request > ApacheFetch::Wait > pthread_cond_timedwait
4 11  ap_process_request > RewriteDriver::Flush > pthread_cond_timedwait
4 10  unixd_accept > accept4
4 09  WorkThread::Run > pthread_cond_wait
4 08  WorkThread::Run > AprMemCache::MultiGet > epoll_wait
4 07  WorkThread::Run > pthread_cond_wait
4 06  SerfThreadFn > TransferFetchesAndCheckDone > pthread_cond_timedwait
4 05  WorkThread::Run > pthread_cond_wait
4 04  WorkThread::Run > pthread_cond_wait
4 03  WorkThread::Run > pthread_cond_wait
4 02  WorkThread::Run > pthread_cond_wait
4 01  main > ap_run_mpm > ap_mpm_pod_check > read
jeffkaufman commented 9 years ago

We need to learn which thread or threads are spinning and burning 100% cpu, since the backtraces don't look like they should be spinning.

@crowell When you next manage to reproduce this, could you run:

$ ps -p [PID] -w -w -L -o %cpu,lwp,pid,user,args

On the apache process taking 100% CPU? Then in gdb we can see which thread has that LWP id.

For example, on my system (not having the problem) I currently see:

$ ps -p 3753 -w -w -L -o %cpu,lwp,pid,user,args
%CPU   LWP   PID USER     COMMAND
 0.0  3753  3753 jefftk   /usr/local/apache2/bin/httpd -k start
 0.0  3754  3753 jefftk   /usr/local/apache2/bin/httpd -k start
 0.0  3755  3753 jefftk   /usr/local/apache2/bin/httpd -k start
 0.0  3756  3753 jefftk   /usr/local/apache2/bin/httpd -k start

And when I look in gdb:

> thread apply all bt
...
Thread 3 (Thread 0x2b8e8bf27700 (LWP 3755)):
...
jeffkaufman commented 9 years ago

The way PthreadCondvar::TimedWait in pagespeed/kernel/thread/pthread_condvar.cc is not checking the return code on pthread_cond_timedwait doesn't look right. Now, pthread_cond_timedwait should only return a few errors but if we gave one invalid input it would spin forever.

morlovich commented 9 years ago

The only checking that seems appropriate for those would be a CHECK ---- ETIMEOUT is the only one that can happen barring major bugs in our code, and ignoring ETIMEOUT is the right thing to do.

On Thu, Aug 20, 2015 at 7:50 AM, Jeff Kaufman notifications@github.com wrote:

The way PthreadCondvar::TimedWait in pagespeed/kernel/thread/pthread_condvar.cc is not checking the return code on pthread_cond_timedwait doesn't look right. Now, pthread_cond_timedwait should only return a few errors http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html but if we gave one invalid input it would spin forever.

— Reply to this email directly or view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1048#issuecomment-132984670 .

jeffkaufman commented 9 years ago

@morlovich agreed

crowell commented 9 years ago

backtrace with on a debug build, along with thread process stats.

https://gist.github.com/e5f323af7e879c226e91

jeffkaufman commented 9 years ago

@crowell That's very strange; doesn't look like 100% cpu at all? Only 25951 and 25901 are at all busy, and they're doing actual work, not waiting around like in our other backtraces.

jeffkaufman commented 9 years ago

@crowell I count 11 of 36 threads as doing real cpu-involving work as opposed to waiting, while on the other backtraces it was at most 1.

WpSEOit commented 9 years ago

@jeffkaufman when i has the problem only 1 or 2 httpd.worker stuck with high cpu but all the rest httpd process working normally

jmarantz commented 9 years ago

There are some stack-frames that look concerning:

Thread 8 (Thread 0x7f7e85feb700 (LWP 25875)):

0 pthread_cond_timedwait@@GLIBC_2.3.2 () at

../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238

1 0x00007f7e9bb24e47 in net_instaweb::PthreadCondvar::_TimedWait

(this=0x7f7e7431be38, timeoutms=0) at pagespeed/kernel/thread/pthread_condvar.cc:66

2 0x00007f7e9b906bac in

net_instaweb::CheckingThreadSystem::CheckingCondvar::TimedWait (this=0x7f7e744faf68, timeout_ms=5000) at pagespeed/kernel/base/checking_thread_system.cc:55#3 0x00007f7e9b62e9bc in net_instaweb::ApacheFetch::Wait (this=0x7f7e741220a8, rewrite_driver=0x7f7e80135298) at pagespeed/apache/apache_fetch.cc:199

On Thu, Aug 20, 2015 at 2:40 PM, WpSEO.it Hosting WordPress e Consulenza SEO notifications@github.com wrote:

@jeffkaufman https://github.com/jeffkaufman when i has the problem only 1 or 2 httpd.worker stuck with high cpu but all the rest httpd process working normally

— Reply to this email directly or view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1048#issuecomment-133112237 .

crowell commented 9 years ago

@jmarantz yeah, this is also the case in other threads/processes

(gdb) where
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f7e9bb24e47 in net_instaweb::PthreadCondvar::TimedWait (this=0x7f7e9033d178, timeout_ms=0)
    at pagespeed/kernel/thread/pthread_condvar.cc:66
#2  0x00007f7e9b906bac in net_instaweb::CheckingThreadSystem::CheckingCondvar::TimedWait (this=0x7f7e902136c8, 
    timeout_ms=5000) at pagespeed/kernel/base/checking_thread_system.cc:55

the code for the function in frame 2

  virtual void TimedWait(int64 timeout_ms) {
    mutex_->DropLockControl();
    condvar_->TimedWait(timeout_ms);
    mutex_->TakeLockControl();
  }

so there really should be no reason for timeout_ms being reset to 0...

jmarantz commented 9 years ago

Never mind; that was a red herring; PthreadCondvar::TimedWait overwrites that formal and it winds up holding the number of miliseconds. But the data is not lost.

int64 timeout_sec_incr = timeout_ms / Timer::kSecondMs;
  *timeout_ms %= Timer::kSecondMs;*
  // Figure out current time, compute absolute time for timeout
  // Carrying ns to s as appropriate.  As morlovich notes, we
  // get *really close* to overflowing a 32-bit tv_nsec here,
  // so this code should be modified with caution.
  if (gettimeofday(&current_time, NULL) != 0) {
    LOG(FATAL) << "Could not determine time of day";
  }
  timeout.tv_nsec = current_time.tv_usec * 1000 + timeout_ms * kMsNs;
  timeout_sec_incr += timeout.tv_nsec / Timer::kSecondNs;
  timeout.tv_nsec %= Timer::kSecondNs;
  timeout.tv_sec = current_time.tv_sec +
static_cast<time_t>(timeout_sec_incr);
  // Finally we actually get to wait.
  pthread_cond_timedwait(&condvar_, &mutex_->mutex_, &timeout);

-Josh

On Thu, Aug 20, 2015 at 2:58 PM, Joshua Marantz jmarantz@google.com wrote:

There are some stack-frames that look concerning:

Thread 8 (Thread 0x7f7e85feb700 (LWP 25875)):

0 pthread_cond_timedwait@@GLIBC_2.3.2 () at

../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238

1 0x00007f7e9bb24e47 in net_instaweb::PthreadCondvar::_TimedWait

(this=0x7f7e7431be38, timeoutms=0) at pagespeed/kernel/thread/pthread_condvar.cc:66

2 0x00007f7e9b906bac in

net_instaweb::CheckingThreadSystem::CheckingCondvar::TimedWait (this=0x7f7e744faf68, timeout_ms=5000) at pagespeed/kernel/base/checking_thread_system.cc:55#3 0x00007f7e9b62e9bc in net_instaweb::ApacheFetch::Wait (this=0x7f7e741220a8, rewrite_driver=0x7f7e80135298) at pagespeed/apache/apache_fetch.cc:199

On Thu, Aug 20, 2015 at 2:40 PM, WpSEO.it Hosting WordPress e Consulenza SEO notifications@github.com wrote:

@jeffkaufman https://github.com/jeffkaufman when i has the problem only 1 or 2 httpd.worker stuck with high cpu but all the rest httpd process working normally

— Reply to this email directly or view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1048#issuecomment-133112237 .

crowell commented 9 years ago
(gdb) p timeout_sec_incr
$6 = 5
(gdb) p timeout
$7 = {tv_sec = 1440096674, tv_nsec = 332683000}

yeah seems to be a valid 5 second wait.

crowell commented 9 years ago

https://gist.github.com/903957a54f2cebcbe1b8

3 stack traces before it got out of the "Waiting for completion" state looking now.

crowell commented 9 years ago

some stack traces ~30 seconds apart, with 1 rewrite thread in config.

https://gist.github.com/63d0b4e6c4c406d2b2d3

they're all identical. after stopping the load on the server, the issue cleared up.

crowell commented 9 years ago

https://github.com/pagespeed/mod_pagespeed/releases/tag/1.9.32.8 should fix the issue.

anyone who has been affected by this, please give this pre-release a try and report back!

WpSEOit commented 9 years ago

installed. I will give you a feedback in the next days. Thanks for all

crowell commented 8 years ago

While we're confident now that the "Waiting for Completion" state has been fixed, there seems to be a second, related, issue of hangs within apr_memcache2_multgetp

It may be possible for apr_pollset_poll to return a value that isn't APR_SUCCESS due to a failure that isn't a timeout.

It may also be possible for get_server_line to be called with conn->bb as empty, causing infinite reads of size zero.

We're working on testing these, and will have a test binary soon for interested users.

sv72 commented 8 years ago

Installed 1.9.32.8 on production this morning after having run it succesfully in test for 2 weeks. Tonight unfortunately waiting for completion starts occurring again:

[Tue Sep 22 22:05:35 2015] [warn] [mod_pagespeed 1.9.32.8-7388 @13009] Waiting for completion of URL http://m.xxxxx.nl/m20/?gclid=xxx for 110.002 sec.
[Tue Sep 22 22:05:35 2015] [warn] [mod_pagespeed 1.9.32.8-7388 @16737] Waiting for completion of URL http://m.xxxxx.com/wcsstore/dojo18/dijit/layout/TabContainer.js for 100.002 sec.
[Tue Sep 22 22:05:35 2015] [warn] [mod_pagespeed 1.9.32.8-7388 @1573] Waiting for completion of URL http://m.xxxx.nl/m20/?gclid=xxxx 75.002 sec.
[Tue Sep 22 22:05:35 2015] [warn] [mod_pagespeed 1.9.32.8-7388 @13522] Waiting for completion of URL http://m.xxxx.com/wcsstore/dojo18/dijit/layout/_TabContainerBase.js for 70.001 sec.
[Tue Sep 22 22:05:35 2015] [warn] [mod_pagespeed 1.9.32.8-7388 @17553] Waiting for completion of URL http://m.xxxx.com/wcsstore/dojo18/dijit/form/DropDownButton.js for 80.002 sec.

25.000+ times in 15 minutes and then all seems to be fine again. Only happened on 1 out of 3 webservers, but seems to be same behavior as before 1.9.32.8

jeffkaufman commented 8 years ago

@sv72: Did the CPU usage go to 100% when this happened?

sv72 commented 8 years ago

Actually no, CPU was not higher than normal (30-35%), but memory usage did go to 95% (normally around 50%)

eldk commented 8 years ago

Hello,

I have pushed mod_pagespeed 1.9.32.10-7423 to production server (only one domain) and re-enabled rewrite of images in fullsize to test it.

That seems to be ok.

I have those error in apache error.log, a few ones, less than 1 % of rewrited images : Slow read operation on file (...) configure SlowFileLatencyUs to change threshold (files are all on local disk and loadFromFile).

Is it a new mod_pagespeed parameter or should we use an existing one ?

My conf :

--
Version: 14: on

Filters
cw  Collapse Whitespace
jc  Combine Javascript
gp  Convert Gif to Png
jp  Convert Jpeg to Progressive
jw  Convert Jpeg To Webp
pj  Convert Png to Jpeg
dj  Defer Javascript
hw  Flushes html
io  In-place optimize for browser
idp Insert DNS Prefetch
js  Jpeg Subsampling
pr  Prioritize Critical Css
rj  Recompress Jpeg
rp  Recompress Png
rw  Recompress Webp
rc  Remove Comments
cf  Rewrite Css
jm  Rewrite External Javascript
jj  Rewrite Inline Javascript
cp  Strip Image Color Profiles
md  Strip Image Meta Data

Options
  CssInlineMaxBytes (ci)            10240
  EnableCachePurge (euci)           True
  EnableRewriting (e)               1
  FileCacheCleanIntervalMs (afcci)  3600000
  FileCachePath (afcp)              /var/cache/mod_pagespeed/
  FileCacheSizeKb (afc)             4096000
  ImplicitCacheTtlMs (ict)          15549367000
  LoadFromFileCacheTtlMs (lfct)     15549367000
  LogDir (ald)                      /var/log/pagespeed
  LRUCacheByteLimit (alcb)          16384
  LRUCacheKbPerProcess (alcp)       1024
  MemcachedServers (ams)            localhost:11211
  RequestOptionOverride (roo)      
  RewriteDeadlinePerFlushMs (rdm)   20
  RewriteLevel (l)                  Optimize For Bandwidth
  SslCertDirectory (assld)          /etc/ssl/certs
  StatisticsLogging (asle)          True
  SupportNoScriptEnabled (snse)     False
  UrlSigningKey (usk)               

Domain Lawyer
crowell commented 8 years ago

@eldk that is just a diagnostic message, letting you know that the time to read a file from disk passed an arbitrary amount of time that we determined to be "slow", it doesn't impact performance, just the message is logged, nothing is cancelled or unscheduled. If you want to silence this, you can configure the SlowFileLatencyUs to a higher number.

eldk commented 8 years ago

Thanks.