ampproject / amp-toolbox-php

AMP Optimizer PHP library
Apache License 2.0
74 stars 23 forks source link

i-amphtml-version=0 is invalid #530

Closed m1rm closed 1 year ago

m1rm commented 1 year ago

If the metadata endpoint does not respond successfully, the current version ist set to '0': https://github.com/ampproject/amp-toolbox-php/blame/main/src/RuntimeVersion.php#L80-L82

This is used by the AmpRuntimeCss.php.

Atm I cannot provide a reproduction, but this occurs from time to time.

Desired behaviour: Instead of setting the version to '0' it would be nice to throw an error/exception.

m1rm commented 1 year ago

we found out that the issue originates in the plain return of the cached response, which can have a non positive return status code.

https://github.com/ampproject/amp-toolbox-php/blob/main/src/RemoteRequest/TemporaryFileCachedRemoteGetRequest.php#L142

dritter commented 1 year ago

Btw. this is the content of the 404:

O:49:"AmpProject\RemoteRequest\RemoteGetRequestResponse":4:{s:55:"AmpProject\RemoteRequest\RemoteGetRequestResponsebody";s:1573:"<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/rtv/metadata</code> was not found on this server.  <ins>That’s all we know.</ins>
";s:58:"AmpProject\RemoteRequest\RemoteGetRequestResponseheaders";a:5:{s:12:"content-type";a:1:{i:0;s:24:"text/html; charset=UTF-8";}s:15:"referrer-policy";a:1:{i:0;s:11:"no-referrer";}s:14:"content-length";a:1:{i:0;s:4:"1573";}s:4:"date";a:1:{i:0;s:29:"Mon, 22 Aug 2022 11:36:36 GMT";}s:7:"alt-svc";a:1:{i:0;s:162:"h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"";}}s:63:"AmpProject\RemoteRequest\RemoteGetRequestResponseheadersIndex";N;s:61:"AmpProject\RemoteRequest\RemoteGetRequestResponsestatusCode";i:404;}

We don't know why this happens, and we cannot reproduce it manually. All that we have is an EventSubscriber that calls TransformationEngine->optimizeHtml(). When we try to do the same thing inside the prodcution container it works fine.

To make things worse, the 404 Response has no Expires Header set, so the default TemporaryFileCachedRemoteGetRequest::EXPIRY_TIME of 30 Days (2592000 Seconds) is used, hence caching the 404 for that time. In opposite, a succeeding request to /rtv/metadata expires immediately.

dritter commented 1 year ago

Btw. @westonruter do you have some insights why we get that 404 on the https://cdn.ampproject.org/rtv/metadata route? This seems to me like there is still something wrong somewhere..

westonruter commented 1 year ago

Sorry, I'm not sure. Can you curl the URL from the command line on that same environment?

dritter commented 1 year ago

Unfortunately not. We tried curling from CLI in the production container as well as executing the TransformationEngine in a PHP Script in Production. All without success. :(

It is mysterious to us why this happens, but sometimes we get this 404. The file above is the contents of /tmp/amp-remote-request-f665fa684dc62ea91ccd94bae32b58df, which seems to be googles standard 404 page. Strange that the AMP CDN sometimes routes /rtv/metadata to that page. I hoped you have access to more informations than we have.

westonruter commented 1 year ago

So to confirm, even when you curl from the command line you get a 404 when requesting https://cdn.ampproject.org/rtv/metadata? In other words, the issue is independent of amp-toolbox-php?

dritter commented 1 year ago

So to confirm, even when you curl from the command line you get a 404 when requesting

No. We had no success reproducing the issue in any way.

In other words, the issue is independent of amp-toolbox-php?

Yes and no. :) No, because the 404 is a Server Error. So it has to be something on the Server side. And Yes, because we see it only in production with amp-toolbox-php..

dritter commented 1 year ago

Btw. To quantify this, we got around 53k FailedToGetCachedResponse Errors in the last 7 days:

image

We thought that this might be some DDOS Protection from the CDN, but it seems odd, that this would redirect to googles 404 page..

dritter commented 1 year ago

Another guess of us was that it is caused by a redirect. After disabling automatically following redirects and add more logging, I can say, that it is caused NOT by a redirect. We get the 404 directly in the first try.

Relevant Log message:

Failed to fetch the contents from the URL 'https://cdn.ampproject.org/rtv/metadata' as it returned HTTP status 404. Try: 0 https://cdn.ampproject.org/rtv/metadata 

You can see my changes here: dritter:amp-toolbox-php/add_more_logging