apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.36k stars 363 forks source link

image 404s, probably due to ipro-recorded resources falling out of cache with failing fetch #1319

Open froilanmendoza opened 8 years ago

froilanmendoza commented 8 years ago

pagespeed v1.11.33.4-0 nginx v1.10 magento EE 1.13.0.2

Pagespeed generates corrupted JS when turned on. I have only enabled the bare minimum pagespeed settings on nginx:

`pagespeed on; pagespeed FileCachePath /var/ngx_pagespeed_cache;

pagespeed Statistics on; 
pagespeed StatisticsLogging on;
pagespeed LogDir /var/log/pagespeed;
pagespeed AdminPath /pagespeed_admin;

location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { add_header "" ""; }
location ~ "^/ngx_pagespeed_static/" { }
location ~ "^/ngx_pagespeed_beacon" { }`

The uncorrupted JS can be found in https://www.myspicesage.com/media/js/d6a0f3e34b288de48f06382595f085d4-v2.18.js while the corrupted one when pagespeed is enabled is attached (screenshot). I have disabled pagespeed now (obviously, as this is a production system) but they were all tested on www.myspicesage.com

Thank you.

pagespeed-screenshot
jmarantz commented 8 years ago

Does the problem also show up if you leave ngx_pagespeed installed on the system, but turn it off with:

    pagespeed off;

If you do that, rather than uninstalling, your site should work properly, but we can remote-debug it with query-parameters to your server. Note that your configuration is minimal, but the default setting of 'CoreFilters' is what you were getting, and there's a fair amount of rewriting going on.

One other question: where does the JS come from in your server? Is it read directly from the file-system by nginx? Is it proxied from another origin on your network?

One possibility is that this is related to these issues:

pagespeed/mod_pagespeed#1362 pagespeed/mod_pagespeed#1371

In this case a workaround might be:

    pagespeed HttpCacheCompressionLevel 0;

Could you try that?

froilanmendoza commented 8 years ago

Does the problem also show up if you leave ngx_pagespeed installed on the system, but turn it off with: --> No. I also never uninstall it, I just comment it out. In any case, I have the site up with "pagespeed off;" and you can run your debug.

One other question: where does the JS come from in your server? --> locally served via the filesystem; no CDN

Could you try that? (HttpCacheCompressionLevel 0) --> tried it, same problem. JS was corrupted.

Thank you.

jmarantz commented 8 years ago

Thanks -- I'm having a little trouble still experimenting with your site. With ?PageSpeed=on I can see that HTML is minified, though the configuration you provided does not specify HTML minification, and that is not on by default (not in CoreFilters) I can see no other optimizations (e.g. no other URL rewriting). And with PageSpeed=on, there is no X-Page-Speed response header.

I'm curious, what is the purpose of this line in your config?

location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { add_header "" ""; }

Can you tell me what else on your site might be minifying HTML other than PageSpeed? Is PageSpeed being applied at more than one server in the flow? Are you intentionally stripping the X-Page-Speed header, or stripping the ?PageSpeed=on query parameter I am sending in?

froilanmendoza commented 8 years ago

That line in the config was just based on your configuration page: https://developers.google.com/speed/pagespeed/module/configuration

I have since commented it out, is that ok? Here's that I have on my config now: `

# Pagespeed main settings

pagespeed off;
#pagespeed FileCachePath /var/ngx_pagespeed_cache;
#pagespeed HttpCacheCompressionLevel 0;

#pagespeed Statistics on; 
#pagespeed StatisticsLogging on;
#pagespeed LogDir /var/log/pagespeed;
#pagespeed AdminPath /pagespeed_admin;

#pagespeed EnableFilters extend_cache;

# Ensure requests for pagespeed optimized resources go to the pagespeed
# handler and no extraneous headers get set.

#location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { add_header "" ""; }
#location ~ "^/ngx_pagespeed_static/" { }
#location ~ "^/ngx_pagespeed_beacon" { }

## end pagespeed

`

No other external resources are minified. If you are referring to https://www.myspicesage.com/media/js/d6a0f3e34b288de48f06382595f085d4-v2.18.js on the homepage, if you scroll down, you'll see that it is NOT minified. What Magento does (it's a setting that I turned on) is it combines all JSs and that first JS file happens to be already minified. It's the same with the css file (such as https://www.myspicesage.com/media/css_secure/eb0df35e22a81d2150af7faddb2a014c-v2.18.css)

As to why you're not seeing the X-Page-Speed header, didn't we turn off Pagespeed? (pagespeed off)? When I had pagespeed on, I see this:

curl -I -p https://www.myspicesage.com HTTP/1.1 200 OK Date: Sat, 12 Nov 2016 02:13:38 GMT Server: nginx/1.10.0 Content-Type: text/html; charset=UTF-8 Pragma: no-cache P3p: CP="CAO PSA OUR" X-Page-Speed: 1.11.33.4-0 Cache-Control: max-age=0, no-cache, no-store, must-revalidate, post-check=0, pre-check=0 Set-Cookie: frontend=ep4rp6a0raar8huiop28q2r2c0; expires=Sat, 26-Nov-2016 02:13:37 GMT; path=/; domain=www.myspicesage.com; HttpOnly Via: 1.1 www.myspicesage.com

When 'pagespeed off':

curl -I -p https://www.myspicesage.com HTTP/1.1 200 OK Date: Sun, 13 Nov 2016 01:13:55 GMT Server: nginx/1.10.0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache P3p: CP="CAO PSA OUR" Set-Cookie: frontend=pme6e3d2dq0m3daqv7cr8dgai0; expires=Sun, 27-Nov-2016 01:13:54 GMT; path=/; domain=www.myspicesage.com; HttpOnly Via: 1.1 www.myspicesage.com

To my knowledge, I'm not doing any stripping, although please test with HTTPS. Everything is being redirected to https even if it's http traffic (i know, we should also redirect the parameters and origin URL, but that's for later).

Thank you.

jmarantz commented 8 years ago

Sorry -- I didn't realize that was our recommended nginx configuration; please put it back :)

The minification question I had was about your HTML (and maybe some embedded JS and CSS). However now your site looks minified without any query-parameters, so I'm guessing that what I saw earlier was due to a Magento plugin.

I also think the ?PageSpeed=on trick for servers with 'off' configured might not work in ngx_pagespeed, only in Apache mod_pagespeed. To debug this, would you be able to set up an alternate subdomain (say test.myspicesage.com) that has "pagespeed on;"?

froilanmendoza commented 8 years ago

Is there anything else we can do to debug/test? I have an existing setup similar to the production system (different machine, but same nginx, pagespeed and magento versions) and that works fine - http://www6.myspicesage.com/freesample/

The only difference (besides being on a faster machine) is that the production (www) is under a load balancer that just redirects traffic if one server is down. I know, though, that we are hitting the nginx server with pagespeed as the other machine under the load balancer is Apache (and does not have pagespeed). I don't think the load balancer is affecting the nginx server.

jmarantz commented 8 years ago

What else might happen on the load-balancer? Is it possible that PageSpeed gzip-encoding the response, but your LB is stripping content-encoding?

I noticed that https://www.myspicesage.com/media/js/d6a0f3e34b288de48f06382595f085d4-v2.18.js is not served with content-encoding:gzip

But then neither is http://www6.myspicesage.com/freesample/media/js/ce00c82b8064fbbb144a034715d96493-v2.18.js.pagespeed.jm.4zLlsLjzm1.js , which confuses me. Chrome definitely sends accept-encoding:gzip. Why isn't your server serving it, especially with PageSpeed?

One thing I wanted to check is whether, when your server is serving corrupted JS, it would be uncorrupted by passing it through "gunzip". But since you only pasted an image of the JS, it was hard to do that :)

froilanmendoza commented 8 years ago

I cannot leave production up with a corrupted JS, obviously :) If you can be online at a certain time at night (Eastern) we can do it.

It's an Apache load balancer and it's possible that it's doing it. Let me see if we can disable the load balancer part, if you think that will help with the debug. At least we can compare it apples-to-apples-ish.

FWIW, both production's and dev's gzip settings are the same on nginx, does that help? gzip on; gzip_comp_level 2; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

jmarantz commented 8 years ago

Oh I just noticed that JavaScript file is served with Brotli, at least to Chrome! Nice! When I request the file with accept-encoding:br the response is 90k. With accept-encoding:gzip it's 121k, so it's a significant savings.

How are you doing that? Is that done by your LB? On the fly, or does it cache it?

I think might be messing up PageSpeed but I don't know yet. We had a goal to generate Brotli ourselves but it never rose to the top of the stack. I think that PageSpeed should not be passing through Accept-Encoding:br but maybe it is, in which case PageSpeed will not be able to decode your JS file.

froilanmendoza commented 8 years ago

Brotli is installed as a module on nginx, it's local to the server. Just one of the many speed improvements we're doing (hence, pagespeed! :) )

Just FYI, we're removing the LB tonight to remove one layer from our debug. I'll let you know when it's done and I'll try another round of testing tonight.

jmarantz commented 8 years ago

I've been trying to see if I could repro the problem -- that is, does it occur on any origin with nginx. I think the bottom line is that ngx_pagespeed can't help with this particular file because it's already minified.

As a workaround, could you use

  pagespeed Disallow */freesample/media/js/*;

We want to keep looking at this, though, because interaction with ngx_brotli has not been tested by us. We definitely want to co-exist with that module. I feel that very likely this is about us failing to interpret the Brotli encoding properly.

froilanmendoza commented 8 years ago

I just enabled brotli on our dev and pagespeed works fine. So, I don't think it's Brotli:

http://www6.myspicesage.com/freesample/

froilanmendoza commented 8 years ago

Also, just to re-iterate, Magento on dev IS also minified prior to pagespeed, and as you can see, it was able to re-minify the page (completely).

jmarantz commented 8 years ago

OK. Have you been able to reproduce the problem since disabling the LB?

Have you seen ngx_brotli work properly with pagespeed enabled? That is, do the assets have content-encoding:br when requested from Chrome?

However I am also wondering if you've used webpagetest to do speed comparisons with & without ngx_brotli. I just scanned through its source code, and IIUC it will run the brotli compressor on every request. This makes sense for HTML, but for CSS and JS it might add too much server-side CPU overhead, and ultimately delay the CSS/JS files being sent to the client. Note that while I didn't see any caching logic in the ngx_brotli code, you can always add a caching layer (eg Varnish), so that the recompression does not happen on every request.

But beware: the failure to properly set up such a cache (you need to put Content-Encoding into the cache key) could result in encoding errors and corrupt assets, exactly like the one you pasted above. That potential problem exists independent of PageSpeed.

froilanmendoza commented 8 years ago

http://www.nene.shoes/collections/spring-matching-shoes-women-kids-made-in-italy --> negatory. we had some issues with our NAT last night. We'll re-try some other time.

Have you seen ngx_brotli work properly with pagespeed enabled? That is, do the assets have content-encoding:br when requested from Chrome? Yes. See https://www6.myspicesage.com/freesample ... I see https://www6.myspicesage.com/freesample/skin/frontend/responsive/default/css/fa/css/A.font-awesome.min.css.pagespeed.cf.fUv_37LnGW.css and has "gzip, deflate, sdch, br" in the Response Headers.

However I am also wondering if you've used webpagetest to do speed comparisons with & without ngx_brotli. Yes. There were improvements but I don't have the exact metrics right now. We run by the numbers via WPT and Google Analytics everytime there's a new push, specially if it's 'speed improvement' update.

Do you think pagespeed and brotli are conflicting? (but as you see on dev, it "works" fine, or at least it is not corrupted)

jmarantz commented 8 years ago

I don't know whether they might be conflicting in some cases. We haven't tested at all with ngx_brotli.

However I can definitely think of ways in which ngx_brotli might, when combined with an external cache that uses pure URLs as keys, corrupt a JS file.

Does your LB have a cache?

froilanmendoza commented 8 years ago

No cache. We used to have Varnish on the webserver, but we went all-SSL recently (i know we can redirect from http to https, but given the rewrites on Magento, it's probably just making it too convoluted). Magento does have its own Full Page Cache system, which we can't disable.

I'm thinking of undoing all the Magento-default minification and merging and let pagespeed do those instead. I'll let you know how it goes.

froilanmendoza commented 8 years ago

So I disabled auto merge of CSS and JS on Magento, turned on pagespeed and tested.

The good news is that it didn't break the site. The bad news is, pagespeed doesn't seem to be working nor did it minify/merged the CSS and JS. I even tried to manually set:

pagespeed on; pagespeed FileCachePath /var/ngx_pagespeed_cache;

pagespeed HttpCacheCompressionLevel 0

pagespeed EnableFilters rewrite_css; pagespeed EnableFilters rewrite_javascript;

See attachment showing the non-minified and non-merged CSS and JS. Even more interesting, jquery was minified by pagespeed (https://www.myspicesage.com/js/responsive/jquery/jquery-1.7.2.min.js.pagespeed.jm.TiC1blcYSb.js)

pagespeed-1116

I'm perplexed.

Also tested last night disabling Magento's full page cache, but that also didn't help.

We're re-testing disabling load balancer later (so all frontend traffic goes straight to the webserver) and I will keep you updated. But do you have any thoughts why the above happened?

Lofesa commented 8 years ago

Sorry if you consider that I am spamming this thread. I don´t think is a brotli issue. I have a small site with http/2, NPS and ngx_brotli and any files goes corrupt. Not Magento but a WP site with full page cache in a redis plugin. The site has css and js files minified and combined with a wp plugin becuse I have played al sort of combinations ( w/o plugin, combine and minify with NPS, files alone....) and the fast solution I had was this. I ran https://www.myspicesage.com/?PageSpeed=on&PageSpeedFilters=+debug and get a bunch of <!--4xx status code, preventing rewriting of .... on images,css and the js file referrenced at start of this thread. I get <!--CSS not inlined since it&#39;s bigger than 2048 bytes--> too on some css

Hope this help P.S. Sorry for my bad english

jmarantz commented 8 years ago

RE combining.... Did you enable the combining filters? If you add ?PageSpeedFilters=+debug then the html will be annotated with comments indicating why a filter did not apply to a tag.

Also, can you remove the "HttpCacheCompression 0" setting as your earlier experiment did not indicate that this was related to the corruption. Making sure your content is compressed, at least with gzip, is pretty important for data reduction.

Also, please remember when you test to refresh the page a few times after changing settings, to warm up the server side cache.

On Nov 16, 2016 1:13 PM, "froilanmendoza" notifications@github.com wrote:

So I disabled auto merge of CSS and JS on Magento, turned on pagespeed and tested.

The good news is that it didn't break the site. The bad news is, pagespeed doesn't seem to be working nor did it minify/merged the CSS and JS. I even tried to manually set:

pagespeed on; pagespeed FileCachePath /var/ngx_pagespeed_cache;

pagespeed HttpCacheCompressionLevel 0

pagespeed EnableFilters rewrite_css; pagespeed EnableFilters rewrite_javascript;

See attachment showing the non-minified and non-merged CSS and JS. Even more interesting, jquery was minified by pagespeed ( https://www.myspicesage.com/js/responsive/jquery/jquery-1. 7.2.min.js.pagespeed.jm.TiC1blcYSb.js)

[image: pagespeed-1116] https://cloud.githubusercontent.com/assets/23323554/20359526/49533366-abfe-11e6-86b8-640ecedeffef.png I'm perplexed.

Also tested last night disabling Magento's full page cache, but that also didn't help.

We're re-testing disabling load balancer later (so all frontend traffic goes straight to the webserver) and I will keep you updated. But do you have any thoughts why the above happened?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pagespeed/ngx_pagespeed/issues/1319#issuecomment-261025295, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2kPVEJtI0H824UjJK5wIJoxPlD1XScks5q-0esgaJpZM4Kvt2g .

froilanmendoza commented 8 years ago

Our latest iteration is as follows: no LB (direct to webserver), pagespeed on, no brotli (i compiled a new binary without brotli), ignore media/js/*, disable js rewrite, enable css rewrite.

pagespeed on; pagespeed FileCachePath /var/ngx_pagespeedcache; pagespeed Disallow /media/js/_; pagespeed EnableFilters rewrite_css;

pagespeed EnableFilters rewrite_javascript;

    pagespeed EnableFilters extend_cache;

location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { add_header "" ""; }
location ~ "^/ngx_pagespeed_static/" { }
location ~ "^/ngx_pagespeed_beacon" { }

` It works, somewhat. But every few hours or so (3-4 hours), I have to restart nginx because I get a 404 on some images. For example, on this page https://www.myspicesage.com/himalayan-pink-salt-p-146.html, the swatch images get blurry/404:

github-111716

After restarting nginx, the problem goes away. The past 12 hours, it has happened 3 times.

@Lofesa - thanks. Let me take a look at what you suggested. Thank you.

jmarantz commented 8 years ago

My track-record for guessing the problems on your site is not good. But I just looked at view-source:https://www.myspicesage.com/himalayan-pink-salt-p-146.html?PageSpeedFilters=+debug,combine_css,combine_javascript

And I see tags like this:

<link rel="stylesheet" type="text/css" href="https://www.myspicesage.com/media/css_secure/84ed2258d5d2c3d7f4ed072587aec047-v2.18.css" media="all"/>
<!--The preceding resource was not rewritten because its domain (www.myspicesage.com) is not authorized-->

This looks surprising because https://www.myspicesage.com is the host in the URL bar. However, it's possible that this is not the host being sent by your LB to nginx, and you might need to authorize it explicitly.

  pagespeed Domain http*//www.myspicesage.com;

that will at least resolve that particular issue. You mentioned earlier, as well, that when http resources are fetched that they are redirected to https. Currently the fetcher uses by ngx_pagespeed does not follow redirects, so you may want to use MapOriginDomain so that PageSpeed rewrites the URL before attempting to fetch it. See https://developers.google.com/speed/pagespeed/module/domains#mapping_origin for details.

froilanmendoza commented 8 years ago

@jmarantz thanks for the help, i appreciate it. I tried pagespeed Domain... but I still get the authorization error. See now. Also, LB has been disabled. We're connecting direct to the nginx webserver.

The 404 on images kept reoccurring more frequently that we had to disallow the entire media directory to prevent pagespeed rewrite.

jmarantz commented 8 years ago

RE authorization: you are sure that you restarted your server after adding the domain-authorization command?

RE 404s appearing.... I have a plausible explanation for that, and a possible workaround. TL;DR: please try any or all of these:

a) enable LoadFromFile if you can (https://developers.google.com/speed/pagespeed/module/domains#ModPagespeedLoadFromFile) b) make sure that fetching works -- e.g. that you can run wget https://www.myspicesage.com/media/css_secure/84ed2258d5d2c3d7f4ed072587aec047-v2.18.css from the machine running ngx_pagespeed. c) make your file-cache much bigger -- 4x the size of your site overall. This is kind of a hack and a last resort, but might be easiest for you to try. See https://developers.google.com/speed/pagespeed/module/system

Any of these should resolve the problem (if I'm right about the cause).

Longer explanation:

  1. Fetching is not working from your HTTP server. For example, you might not have access to DNS from your web server. This prevents ngx_pagespeed from requesting resources on demand.
  2. ngx_pagespeed records unoptimized resources as part of the InPlaceResourceOptimization flow, and puts them in the server-side cache. This only happens when a client requests an optimized resource, say foo.png.
  3. Oncefoo.png is in the ngx_pagespeed's cache, it can be optimized when we see <img src=foo.png /> in your HTML. The src= attribute will be rewritten to xfoo.png.pagespeed.ic.HASH.png, which will also be stored in the cache.
  4. Everything will be fine until the HTTP cache is cleaned, and we lose both the original and rewritten image URLs. However, ngx_pagespeed will remember the fact that foo.png will be rewritten to xfoo.png.pagespeed.ic.HASH.png in a separate cache, which won't be cleaned at the same time as the HTTP cache.
  5. A client requests xfoo.png.pagespeed.ic.HASH.png
  6. ngx_pagespeed looks that image up in its cache, and doesn't find it.
  7. It then decodes that URL and attempts to fetch foo.png, first from cache (also fails) and then tries to fetch it via HTTP[s].
  8. That fails and a 404 is issued.

This may be related to https://github.com/pagespeed/mod_pagespeed/issues/1145

froilanmendoza commented 8 years ago

Re: authorization - yes, of course. restarting nginx after every change.

I'll work on your suggestions later tonight! Thank you!

jmarantz commented 8 years ago

Another way the 404 could be triggered is if you have multiple ngx_pagespeed-enabled servers, and we've cached the optimized resource only on one of them.

One other thing you said about your LB piqued my interest, in retrospect: " is under a load balancer that just redirects traffic if one server is down." what does it redirect it to? If it can fall back to a server that's not running ngx_pagespeed, that would certainly cause this problem.

Another workaround is to use OptimizeForBandwidth: https://developers.google.com/speed/pagespeed/module/optimize-for-bandwidth . The downside there is you won't get inlining, combining, or cache extension.

froilanmendoza commented 8 years ago

Only one ngx_pagespeed server in this setup. When we had the LB setup, the other server is an apache server. But again, for this setup, we've eliminated LB from the equation.

I'll take a look at/try OptimizeForBandwidth as well and let you know how it goes. Thanks!

jmarantz commented 8 years ago

A possible strategy for a fix: avoid pagespeed-rewrites for resources recorded via the ipro recorder. consider them a miss in that context.

jmarantz commented 8 years ago

Hi froilanmendoza. I have a reproduction for what I think is the bug you have run into.

I know I gave you a bunch of workarounds which can help sort things out for you, but we consider this a pretty bad bug (404s on simple pages referencing resources), and could really use your help understanding when/why this situation arises.

If your testcase matches my reproduction (https://github.com/pagespeed/mod_pagespeed/commit/a6fe5f6d0e49088065fce670ee45ca73e9ae0659) then the root cause is that fetching from your server doesn't work. I really want to know why this arises. Possible theories I have are:

  1. At your nginx instance, you have done something to firewall your machine from the internet, and you cannot initiate internet fetches from it. In this case, I expect "wget http://www.nytimes.com" to fail.
  2. There's something specific about fetching your own site resources from your nginx machine, maybe due to the server self-identifying as a different host.
  3. Your resources are on HTTPS, and the system is not finding the right directory for certificates on your system.

The nginx log may have clues to what's going on.

Thanks! We really appreciate any insight you can provide on this problem!

froilanmendoza commented 7 years ago

Thanks so much. I had to pause pagespeed throughout Thanksgiving/Black Friday/Cyber Monday because the problem/s has/have persisted. TLDR - pagespeed would work after install/restart but then would corrupt js files and generate 404s. Initially, the error affected only a few pages, but then it also affected our checkout pages (primarily because of js corruption) so we had to pause it.

There is no pattern when corruption happens. Sometimes it's just 404, sometimes it's js completely messed up.

For reference/review, here's the setting I used:

Pagespeed main settings

pagespeed off;
pagespeed Domain http*//www.myspicesage.com;
pagespeed Domain *.myspicesage.com;
pagespeed FileCachePath /var/ngx_pagespeed_cache;
#pagespeed Disallow */media/js/*;
pagespeed Disallow */media/*;
pagespeed Disallow */colorselectorplus/*;
pagespeed LoadFromFile "https://www.myspicesage.com/media/" "/var/www/html/media/";
pagespeed LoadFromFile "https://www.myspicesage.com/skin/frontend/responsive/" "/var/www/html/skin/frontend/responsive/";

#pagespeed HttpCacheCompressionLevel 0
pagespeed EnableFilters rewrite_css;
#pagespeed EnableFilters rewrite_javascript;

pagespeed MapOriginDomain http://localhost https://www.myspicesage.com;

pagespeed FileCacheSizeKb            102400;
pagespeed FileCacheCleanIntervalMs   3600000;
pagespeed FileCacheInodeLimit        500000;

pagespeed Statistics on;
pagespeed StatisticsLogging on;
pagespeed LogDir /var/log/pagespeed;
pagespeed AdminPath /pagespeed_admin;

pagespeed EnableFilters extend_cache;

# Ensure requests for pagespeed optimized resources go to the pagespeed
# handler and no extraneous headers get set.

location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { add_header "" ""; }
location ~ "^/ngx_pagespeed_static/" { }
location ~ "^/ngx_pagespeed_beacon" { }

## end pagespeed

Observations/notes/questions:

  1. The js files are media directory, yet the JS files still got pagespeed (then eventually corrupted)

  2. Played with cachesize, etc a bit, but if my hunch that js gets corrupted after a recache, then we'll eventually hit that problem no matter how big we set that size. Agree?

  3. your comment about firewall and certificate - no we don't have firewall and load balancer. we did have certificate chain problem last week (before pushing to live again) and that was fixed. I can do wget fine from command line now:

fmendoza@mssphyweb01 ~]$ wget https://www.google.com --2016-11-30 12:38:17-- https://www.google.com/ Resolving www.google.com... 74.125.28.99, 74.125.28.104, 74.125.28.147, ... Connecting to www.google.com|74.125.28.99|:443... connected. HTTP request sent, awaiting response... 200 OK

  1. you said "There's something specific about fetching your own site resources from your nginx machine, maybe due to the server self-identifying as a different host." --> would be interested to hear more about this. Tho:

`[fmendoza@mssphyweb01 ~]$ host www.myspicesage.com www.myspicesage.com has address 216.86.146.207 [fmendoza@mssphyweb01 ~]$ nslookup www.myspicesage.com Server: 10.0.1.59 Address: 10.0.1.59#53

Non-authoritative answer: Name: www.myspicesage.com Address: 216.86.146.207`

  1. Checked nginx logs while the errors were happening but did not find anything interesting. Probably because a) if the problem is 404, it's a pagespeed-generated "file" and I'm assuming your module does the translation and would not reflect on the nginx/error.log? b) if it's corrupted js, the file is there and served, but the error is on the application level viewable only on firebug? Let me know what to look for and/or if I have to change debug log level so we can get more info the next time I replicate the error.

  2. you mentioned ipro recorder... what is that?

  3. Maybe coincidental (and I didn't test this enough) - the 2nd-to-last change I did was "pagespeed Disallow /media/;" as I was still seeing js corruption; that didn't help after a restart-and-wait. The last change I did was "pagespeed LoadFromFile "https://www.myspicesage.com/media/" "/var/www/html/media/";" and that resulted in nginx running fine with no errors. For a while. I would say after an hour in some instances and >12 hours in another instance, some of the js corruption begins, which would go away after an nginx restart.

I will be re-attempting this sometime this week. Let me know if I need to lookout for something specific or want me to try other setting values. Thank you.

philrice commented 6 years ago

I know this has been a while but im getting this behaviour in 1.14.0 nginx with latest pagespeed and was wondering if the issue was ever resolved? Ive added in LoadFromFile yesterday and it hasnt reoccured since then but that is more of a workaround than a fix isnt it?

jmarantz commented 6 years ago

This issue was never resolved. LoadFromFile is a good workaround and in general a great way to run if it works for your setup.

philrice commented 6 years ago

thanks for the quick answer - yeah seems to be working , at least no repeat of the error since yesterday. im still working through other tweaks to try to iron out some behaviours

jmarantz commented 6 years ago

@oschaaf we lost track of this one a couple of years ago, as @froilanmendoza last comment included something simple worth investigating. If he is disallowing */media/* and it is still getting ipro-rewritten, I think that's worth following up. My guess is that the configuration with that disallow is not being referenced when the media JS file is served, so it's getting ipro-rewritten and into the cache. I'm not sure if that's nginx-specific or not.

I'm still not sure whether that could be the cause of the 404s though; I think that we would still reconstruct a pagespeed resource even if it were covered by a Disallow in the current config. That could be unit-tested. But again this might be related to the differences between mod_pagespeed and ngx_pagespeed.

oschaaf commented 6 years ago

@jmarantz Looking at the code I think ngx_pagespeed Disallow configuration in the context in-place resource optimization is handled correctly.

( https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_pagespeed.cc#L2068 )

jmarantz commented 6 years ago

Cool. My theory is that it's still being ipro-rewritten because it's a on a different virtual host, and the config with the 'disallow' statement is not being referenced when the js file is being served. That should still not cause 404s though.

The next question is how do we handle serving a .pagespeed. URL whose origin resource has been disallowed? I think we should not be paying attention to Disallow statements in that flow, where we are trying to serve a .pagespeed. URL.

oschaaf commented 6 years ago

It looks like .pagespeed.resources will be fetched regardless of any Disallow lines: https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_pagespeed.cc#L1874