apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
697 stars 159 forks source link

Minimal AMP support: don't invalidate AMP #1263

Closed jmarantz closed 8 years ago

jmarantz commented 8 years ago

PageSpeed should never inject JavaScript into AMP pages, as that will invalidate them. I think to determine that we need to look for <html AMP>, although the spec which has a utf-8 lightning bolt character rather than "AMP": <html ⚡>. See https://www.ampproject.org/docs/get_started/about-amp.html . (how does that work when the site is in russian or chinese encoding???)

PageSpeed should optimize src in amp-img tags, which starts with adding some default UrlValuedAttribute settings. See https://www.ampproject.org/docs/reference/amp-img.html . But for the moment, PageSpeed should not transcode amp-img URLs to webp because AMP pages must be served as publicly cacheable without varying on user-agent. So some kind of element equivalent must be employed to allow for UA-dependent img tags.

HansVanEijsden commented 8 years ago

Agreed. For now I disabled Pagespeed for /amp/: pagespeed Disallow */amp/*;

jmarantz commented 8 years ago

I think you might need some stars for that to work, no?

pagespeed Disallow /amp/;

On Tue, Feb 16, 2016 at 12:25 PM, Hans van Eijsden <notifications@github.com

wrote:

Agreed. For now I disabled Pagespeed for /amp/: pagespeed Disallow /amp/;

— Reply to this email directly or view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1263#issuecomment-184785966 .

jmaessen commented 8 years ago

I think the lightning bolt is also legal. UTF-8 is required I think, and this is intended to enforce that.

oschaaf commented 8 years ago

Related post about http-equiv=refresh injection invalidating AMP: https://groups.google.com/forum/#!topic/ngx-pagespeed-discuss/3T4HISDLbW4

grafana-dee commented 8 years ago

@oschaaf worth bearing in mind that the http-equiv=refresh meta tag is in fact HTTP equivalent... and you can simply remove the meta tag altogether and insert it as a HTTP header to achieve the same effect. To my knowledge AMP only specifies HTML, rather than HTTP, and this is a valid HTTP header.

jmarantz commented 8 years ago

mod_pagespeed uses the http-equiv=refresh meta tag in situations where it has already flushed response headers, and while streaming HTML discovers something unusual that makes it want the browser to start over...not a good day out for optimization.

One example where this might occur is when the server has lazyload_images enabled but JS is disabled in a browser, so the correct image will never replace the blank one. So we put the meta-refresh block in a

I don't know how I could replicate this behavior with an http header.

jeffkaufman commented 8 years ago

@jmarantz what's the status of this? With ad71403 and 7bf8083 submitted is this done?

(It doesn't look like ad71403117ad48569c16bc1833d4ed60c42262ba or 7bf8083314ba9d011c750d72893a2f91183b8c7f have been released as bugfixes, so it looks like this would be a fix with 1.12)

jeffkaufman commented 8 years ago

@jmarantz confirms offline that this is done, we optimize amp-img, and there aren't any current cases where we're aware of breaking AMP

jmarantz commented 8 years ago

Sorry: I don't think we optimize amp-img by default yet, however I think we can do that with a separate configuration line. Maybe add that to the release notes for now?

On Tue, Oct 11, 2016 at 7:19 AM, Jeff Kaufman notifications@github.com wrote:

@jmarantz https://github.com/jmarantz confirms offline that this is done, we optimize amp-img, and there aren't any current cases where we're aware of breaking AMP

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1263#issuecomment-252888821, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2kPR_bNL5HQ2554GnOZqAfAJox3F6Lks5qy3DWgaJpZM4HZSZj .

jeffkaufman commented 8 years ago

I think UrlValuedAttribute amp-img src image will probably work, but I should test it. UrlValuedAttribute amp-image srcset ?? isn't possible though.

jeffkaufman commented 8 years ago

This is probably a pretty small change to fix by default; let me look.

jeffkaufman commented 8 years ago

Sent a CL out for review, but I think it's too complex to backport to v34 and include in 1.12. Filed a new bug for optimizing images in AMP: https://github.com/pagespeed/mod_pagespeed/issues/1413

Lofesa commented 7 years ago

Sorry if anyone consider I am spaming this treat but I think is related. In AMP pages the Prioritize Critical Css must be disabled. This filter inject a js script that is considered an error in the amp validation tool (and duplicates all the css)

jmarantz commented 7 years ago

Lofesa, are you saying that you have installed 1.12 and found that amp pages are being served with prioritize critical CSS enabled?

Lofesa commented 7 years ago

@jmarantz , yes I am saying that. You can see it at https://intersindical.intersindicalrm.org/la-confederacion-intersindical-rechaza-folleto-hazte-oir/amp/ or https://intersindical.intersindicalrm.org/la-confederacion-intersindical-rechaza-folleto-hazte-oir/?amp

hillsp commented 7 years ago

It looks like CriticalSelectorFilter::GetScriptUsage does return kWillInjectScripts and isn't in RewriteOptions::kRequiresScriptExecutionFilterSet. It ought to have been disabled but clearly it wasn't.

My best guess is that something about the site is fooling our AMP detector. It's either that or the auto-disabling of JS filters for AMP isn't working.

Lofesa commented 7 years ago

Hi @hillsp The site is a WP and get the AMP pages with these plugins:

https://wordpress.org/plugins/amp/ https://wordpress.org/plugins/accelerated-mobile-pages/

All pages and post works with /?amp and post with /amp/

Images url are not rewrited but the etag is like W/"PSAxxxxxxxx and had an x-original-content-length

jmarantz commented 7 years ago

Hi Lofesa -- I'll take a look at this. But can you open up a new issue (pointing to this one)? This one was already closed and it would be easier for us to track it as a new issue.

jmarantz commented 7 years ago

Lofessa -- I made a small testcase and could not repro in mod_pagespeed. Checking ngx_pagespeed now.

jmarantz commented 7 years ago

Can't repro this in ngx_pagespeed either. Can you send your config?

jeffkaufman commented 7 years ago

If there's a bug in our amp detection and you need to turn off a filter for some pages, one way to do this in nginx would be roughly:

http {
   pagespeed ProcessScriptVariables on;
   ...
   server {
      ...
      set $disable_filters "";
      if ($request_uri ~ .*/amp/.*) {
          set $disable_filters "prioritize_critical_css";
      }
      pagespeed DisableFilters "$disable_filters";
      ...
   }
}

See https://developers.google.com/speed/pagespeed/module/system#nginx_script_variables and https://developers.google.com/speed/pagespeed/module/https_support#h2_configuration_nginx

I haven't tested this configuration, though

Lofesa commented 7 years ago

@jmarantz this work but I leave the config w/o it so, if needed, you can see the js injected

djixas commented 5 years ago

Today I've had this issue where mod pagespeed started injecting custom javascript into every AMP page, causing every one of them to become invalid.

<noscript><meta HTTP-EQUIV="refresh" content="0;url='https://xxx.com amp/?PageSpeed=noscript'" /><style><!--table,div,span,font,p{display:none} --></style><div style="display:block">Please click <a href="https://xxx.com/---/amp/?PageSpeed=noscript">here</a> if you are not redirected within a few seconds.</div></noscript>

This should never happen.

jmarantz commented 5 years ago

@djixas it would be useful to know whether this is a case of:

Related: what version of PageSpeed are you running?