apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.37k stars 363 forks source link

Possible regression: http2, nginx, chrome and windows #1146

Open jdelgad opened 8 years ago

jdelgad commented 8 years ago

Possible regression for https://github.com/pagespeed/ngx_pagespeed/issues/788

nginx 1.9.12, pagespeed 1.10.33.6-beta and the same configuration as posted here https://github.com/pagespeed/ngx_pagespeed/issues/1140

We are using SSL with a Let's Encrypt cert. Only Windows + Chrome users are seeing this issue. It ended up bringing the site to a stand still.

Here are a few errors we are getting in errors.log

2016/03/07 05:01:16 [error] 3692#0: 360332 upstream timed out (110: Connection timed out) while reading response header from upstream, client: XXX.XXX.XXX.XXX, server: www.XXXXXX.com, request: "GET /forums/ HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm/php-fpm.sock", host: "www.XXXXX.com" 2016/03/07 06:19:46 [error] 3691#0: 372356 connect() to unix:/var/run/php-fpm/php-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX, server: www.XXXXX.com, request: "GET /path/to/file.jpg,g38273.pagespeed.ic.38c76x.jpg HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm/php-fpm.sock:", host: "www.XXXXX.com", referrer: "https://www.XXXXX.com/url/" 2016/03/07 15:59:53 [alert] 3688#0: 524522 http request count is zero while closing request, client: XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX, server: [::]:443 2016/03/07 16:31:45 [alert] 3690#0: 525244 open socket #45 left in connection 30 2016/03/07 17:53:23 [alert] 11173#0: *41609 http request count is zero while closing request, client: XXX.XXX.XXX.XXX, server: 0.0.0.0:443

jdelgad commented 8 years ago

I had to turn off pagespeed and it appears to be working better. I can get logs. Don't have any core dump, but I don't think it caused any.

oschaaf commented 8 years ago

@jdelgad If you could forward logs that may help figuring this out. Having logs files written with the log level set at debug would help even more.

jdelgad commented 8 years ago

@oschaaf I can post them. The next month is a big month for my site so I probably can't do this before the weekend (and even then for a limited time). Where should I upload them? I can do some limited testing in debug.

oschaaf commented 8 years ago

If the logs aren't too big, you could share them to oschaaf@we-amp.com Are you also seeing the ERR_SPDY_PROTOCOL_ERROR in chrome as mentioned in https://github.com/pagespeed/ngx_pagespeed/issues/788 ?

oschaaf commented 8 years ago

Using Chrome on Windows 10, I tried to reproduce these log messages: ... http request count is zero while closing request ..., but they do not show up for me.

jdelgad commented 8 years ago

@oschaaf The other admin is reporting ERR_SPDY_PROTOCOL_ERROR and took a screenshot. It even suggests Uncaught ReferenceError: mod_pagespeed_fKCT0dPM24 is not defined. We do have a user reporting they had problems with the site on OS X, but still on Chrome.

The site had been up for days prior to experiencing breaking issues for some. Others reported weird visual artifacts almost immediately. I'll send you the logs tonight.

oschaaf commented 8 years ago

@jdelgad could you pm the site's url so I can have a look?

oschaaf commented 8 years ago

Reproduction of attempting to finalize a request with r->count already zero succeeded by refreshing a page in Chrome on Windows as fast as possible by holding ctrl+r on a page which results in a built-in 403/not allowed response.

We attempt to finalize when we should not when:

This sequence of events is similar to https://github.com/pagespeed/ngx_pagespeed/issues/788, but subtly different.

Looking into a fix.

liluxdev commented 8 years ago

Sorry for my messages here, but I'd to stop the testing: it wasn't my case. I don't see this regression, my issue is another and is due to an incompatibility of ngx_pagespeed and this nginx patch i applied: https://trac.nginx.org/nginx/ticket/907 https://trac.nginx.org/nginx/attachment/ticket/907/body2.patch

I've some result from that but as the nginx patch is still under review I'll wait before sending you more details.

jdelgad commented 8 years ago

I restarted my server and got this shortly after I staged it up.

2016/03/12 17:52:47 [info] 12677#0: [ngx_pagespeed 1.10.33.6-0] No threading detected. Own threads: 1 Rewrite, 1 Expensive Rewrite.
2016/03/12 17:56:40 [alert] 12685#0: *944 open socket #29 left in connection
18 2016/03/12 17:56:40 [alert] 12685#0: aborting

I'll notice it for more anomalies before I turn on debug mode. So far it looks fine, but it took 2-3 days for it to become more noticeable. Loading time is much better with it on and I'm not seeing artifacts on Chrome.

froilanmendoza commented 7 years ago

Sorry to revive this thread (if it's already closed), but I'm getting the same error on my logs. Was this fixed and at what version? I'm using pagespeed 1.11.33.4-0 on nginx 1.10.0

Relevant logs: 2016/11/18 13:59:56 [alert] 45869#0: 57213 http request count is zero while closing request, client: xxxx, server: 0.0.0.0:443 2016/11/18 13:59:56 [alert] 45869#0: 57213 http request count is zero while closing request, client: xxxxx, server: 0.0.0.0:443

jdelgad commented 7 years ago

We are no longer using pagespeed. We have disabled http2 since it isn't really supported on CentOS 7 due to the issues pointed at https://www.nginx.com/blog/supporting-http2-google-chrome-users/

At some point early next year I intend to re-enable this. I don't believe it is an issue with http.

I did work with Otto and sent him some logs. I recommend you do you the same. He visited the site and was able to reproduce the issue. We tried a few patches and it seemed to mitigate most of the effects.

What is your setup?

OS & Version. Are you using upstream servers? Are you using PHP? HTTP2? Posting your nginx config may also help Otto and the other pagespeed members track down this bug.

msebel commented 7 years ago

I need to come in here as well, as it's the only reference of the error we're expriencing right now. Unfortunately the logs show nothing more than that:

"http request count is zero while closing request"

With nginx 1.12.0 and pagespeed 1.11.33.4 (old version, as there is no newer PSOL download available). This happens regularly but very rarely (About 3-4 log entries every few hours on both loadbalancers with about 100'000 req/min).

Previously we've had nginx 1.10.2 with 1.11.33.4 and no problems to report.

oschaaf commented 7 years ago

@msebel Thanks for the detailed report. If I understand correctly, it may be worth looking at the code changes between nginx versions 1.10.2 and 1.12.0 for clues then.

msebel commented 7 years ago

Something new (causing a 502) popped up in the logs: [ngx_pagespeed 1.11.33.4-0] [0510/091950:FATAL:ratecontroller.cc(188)] Check failed: controller->current_global_fetch_queuesize->Get() > 0 (0 vs. 0)

But that seems to have happened the first and only time since we've updated nginx ten days ago.

Other then that we have a lot of [error]s in our logs stating that a directory can't be created because there already is one. Not sure if that's a problem though. It happend before, but we've never seen the same problems before our upgrade.

oschaaf commented 7 years ago

@msebel See https://github.com/pagespeed/ngx_pagespeed/issues/915 (also, this should only occur in debug builds, not release builds)

jmarantz commented 7 years ago

Also note: a fix for that exists in 1.12.

On Wed, May 10, 2017 at 3:25 PM, Otto van der Schaaf < notifications@github.com> wrote:

@msebel https://github.com/msebel See #915 https://github.com/pagespeed/ngx_pagespeed/issues/915 (also, this should only occur in debug builds, not release builds)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pagespeed/ngx_pagespeed/issues/1146#issuecomment-300587352, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2kPTeXGJuejCB2AEvbK4_RornYTE40ks5r4g85gaJpZM4HrdSP .