ampproject / amphtml

The AMP web component framework.
https://amp.dev
Apache License 2.0
14.89k stars 3.89k forks source link

Possible Google Cache issue with query string parameters for amp-iframe video interactive #8571

Closed ryanpoplin closed 7 years ago

ryanpoplin commented 7 years ago

How do we reproduce the issue?

1.) On your test device for AMP, please type in 'cnn' in the Google Search Engine field to load in the AMP carousel for CNN.

2.) Pressing on any of the articles in the carousel will take you to the AMP carousel to view single articles.

3.) Scroll until you find 'video' enabled articles and you'll eventually notice that some of our videos are not associated with the article that we're viewing (ex: You may see an article about Syria on the article title and the video has to do with something completely unrelated; as we're using amp-iframe and a custom video collection interactive designed for AMP that's powered via query string parameters from our AMP micro-service: could this be an issue with the Cache-Control response header? Is this something going on with the Google Cache causing our global article path query string parameter to be cached and referencing incorrect videos from the amp-iframe?).

Extra: Via the Chrome Dev Tools, you can filter for 'pal' in the networking tab and see that 'for some articles' this API call for dynamic videos is not the same as the path for the article in the URL (this is dynamically set via our request controller in the micro service and works properly during local testing) and waiting and refreshing the page will eventually load in the correct path and video as a result and you can see this incorrect path is also in the query string parameter in the 'src' attribute for the amp-iframe from the frame source of the document.

What browsers are affected?

Chrome, Safari and Firefox on mobile phone devices.

Which AMP version is affected?

1490899213109

ericlindley-g commented 7 years ago

Thanks for calling this out, @ryanpoplin. I can't reproduce the issue, on my device so far — could you include the URL of a couple examples so we can check it out in more detail?

/cc @vitaliybl on cache side /to @dvoytenko for input on where the issue might be and for help triaging who might be best to take a look

dvoytenko commented 7 years ago

@ryanpoplin Hmm. The Cache does not touch <amp-iframe> elements and cannot affect its response. The only remove possibility is that <amp-iframe>'s src attribute would have a wrong value - not the same as the local article has. However, there's very little possibility that the Cache would affect src.

A somewhat likelier scenario, the remote host serving the frame or a script running in it may use referrer value and something is off there?

ryanpoplin commented 7 years ago

"However, there's very little possibility that the Cache would affect src.": Although unlikely, are you saying that it 'could' happen? We will continue to investigate on the loaded in video interactive on our end, however.

vitaliybl commented 7 years ago

I was able to capture one of the affected pages served from Google AMP Cache. Here's the bad iframe:

`

`

Note that the src appears to contain fragments from two distinct CNN URLs: 2017/01/06/believer-trailer.cnn-promos and 2017/01/16/health/baby-acupuncture-colic-study/. While it is theoretically possible that we have a memory corruption bug in Google AMP Cache that resulted in that, we have no other evidence of such bug (no crashes, no reports from other publishers). It would also be extremely, extremely unlikely that two CNN urls would get combined as a result of such a bug, because we don't segment our servers by publisher.

Much more plausible explanation is that the bug was on CNN's side, and AMP Cache faithfully cached it.

ryanpoplin commented 7 years ago

When making a request to our Google AMP Service, we get the article path that's associated with the URL: https://localhost:5007/cnn/2017/04/06/politics/donald-trump-xi-jinping-china-mar-a-lago/index.html. The path: 2017/04/06/politics/donald-trump-xi-jinping-china-mar-a-lago/index.html is always dynamic based on the request and the 'pal' query string parameter is always going to be that value (which is the value used to make a query for the video data from our API loaded in the amp-iframe).

Here's that article on the carousel now with a bad query string parameter:

`

`

This is/was at: view-source:https://amp-cnn-com.cdn.ampproject.org/v/s/amp.cnn.com/cnn/2017/04/06/politics/donald-trump-xi-jinping-china-mar-a-lago/index.html?amp_js_v=9#origin=https%3A%2F%2Fwww.google.com&exp=a4a%3A1&channelid=0&cid=1&dialog=0&prerenderSize=1&visibilityState=prerender&paddingTop=54&history=1&p2r=0&horizontalScrolling=0&csi=0&storage=1&viewerUrl=https%3A%2F%2Fwww.google.com%2Famp%2Fs%2Famp.cnn.com%2Fcnn%2F2017%2F04%2F06%2Fpolitics%2Fdonald-trump-xi-jinping-china-mar-a-lago%2Findex.html&cap=swipe

One question for your team is that if you keep refreshing that article; it'll show the correct video based on the path provided in the URL, eventually; any idea why? The path is provided to the application via the URL, not any separate API in the handler in our service.

As I just mentioned above, it's now showing correctly in the carousel here: view-source:https://amp-cnn-com.cdn.ampproject.org/v/s/amp.cnn.com/cnn/2017/04/06/politics/donald-trump-xi-jinping-china-mar-a-lago/index.html?amp_js_v=9#origin=https%3A%2F%2Fwww.google.com&exp=a4a%3A0&channelid=0&cid=1&dialog=0&prerenderSize=1&visibilityState=visible&paddingTop=54&history=1&p2r=0&horizontalScrolling=0&csi=0&referrer=https%3A%2F%2Fwww.google.com&storage=1&viewerUrl=https%3A%2F%2Fwww.google.com%2Famp%2Fs%2Famp.cnn.com%2Fcnn%2F2017%2F04%2F06%2Fpolitics%2Fdonald-trump-xi-jinping-china-mar-a-lago%2Findex.html&cap=swipe

`

`

Seems odd and these sorts of debugging issues are why we really wanted to know about possibly getting this ticket resolved for our local testing requirements for the AMP community for confidence reasons before a production release: https://github.com/ampproject/amphtml/issues/7121

Thank you all for helping us on this issue!

vitaliybl commented 7 years ago

I was able to get a bad response directly from amp.cnn.com just now on https://amp.cnn.com/cnn/2017/04/06/politics/who-is-mike-conaway-house-russia-investigation/index.html The issue is showing up intermittently on your site, and therefore intermittently on the cache. See attached screenshot.

screenshot 2017-04-06 at 9 03 35 pm

ryanpoplin commented 7 years ago

https://amp.cnn.com/cnn/2017/04/06/politics/who-is-mike-conaway-house-russia-investigation/index.html is working fine on my end right now. No errors in the console and the video is correct.

amp-issue

As mentioned above, the PATH in the URL will determine the video played by the amp-iframe interactive. That path is the only value that is causing issue, interesting...

dvoytenko commented 7 years ago

@ryanpoplin The same issue on my side. Just tried to pull this page directly from cnn.com and it looks like I'm getting a wrong video. This seems to be rather intermittent.

vitaliybl commented 7 years ago

I'm now seeing it on https://amp.cnn.com/cnn/2017/04/06/politics/us-china-trade-deficit-4-things-to-know/index.html screenshot 2017-04-06 at 10 14 48 pm

ryanpoplin commented 7 years ago

Thanks for helping us consolidate on the Google end guys! We had a parallel network race condition on our micro service affecting the proper path under certain circumstances under load. We plan on making a build tomorrow and can confirm no cache issues on the Google side once it's deployed and I'll update this ticket again for a close.