ampproject / amphtml

The AMP web component framework.
https://amp.dev
Apache License 2.0
14.89k stars 3.88k forks source link

Bing.com AMP cache sets incorrect __amp_source_origin value on CORS requests #21703

Closed src-code closed 5 years ago

src-code commented 5 years ago

What's the issue?

CORS requests made from AMP documents hosted by Bing.com's AMP cache are failing due to incorrect domain being set in the __amp_source_origin query param. The value of this param is being set to the AMP cache domain rather than the publisher's domain. This causes CORS validation to fail on the source origin's server.

How do we reproduce the issue?

Yahoo Sports document hosted by Bing's AMP cache and served via bing.com:

https://www.bing.com/amp/s/sports.yahoo.com/amphtml/fantasy-basketball-rookies-who-will-provide-the-most-future-value-211618398.html

This results in a CORS validation failure, as seen in the console:

Access to fetch at 'https://www.yahoo.com/caas/sidekick/sidekick?appid=amp&lang=en-US&region=US&site=sports&__amp_source_origin=https%3A%2F%2Fsports-yahoo-com.bing-amp.com' from origin 'https://sports-yahoo-com.bing-amp.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

__amp_source_origin is being set to https://sports-yahoo-com.bing-amp.com instead of https://sports.yahoo.com

There's some suspicion that this is related the configuration of the cache, specifically the cdnProxyRegex property, which defaults to the Google AMP CDN domain if not set. See this bit of code, where cdnProxyRegex is configured, otherwise it falls back to Google's CDN:

https://github.com/ampproject/amphtml/blob/2755c10b95d54b98a587b7b091db7d5697684903/src/config.js#L41-L42

Worth asking - if this value is required to be set by a cache to a regex for its own domain in order for AMP documents to function properly, why offer a fallback value at all?

cc @choumx @ssantosms

What browsers are affected?

All

Which AMP version is affected?

Noticed on 1901081935550, suspect it's always been an issue however

jridgewell commented 5 years ago

We can change this to be a required setting in the config. /to @rsimha

rsimha commented 5 years ago

This doesn't look like a tooling issue. Moving to runtime to triage.

jridgewell commented 5 years ago

It's part of the build process where you specify the URL config changes. Runtime just takes what the config object build creates, and uses it to generate URLs. Definitely a build tooling issue. Ping me to discuss.

rsimha commented 5 years ago

Assigning to @erwinmombay who should be familiar with this.

ssantosms commented 5 years ago

In our build system, we are replacing the default config to bing. However, I am not sure why it's not working properly.

@erwinmombay Can you check the below is correct?

From Bing v0.js: var Wc=self.AMPCONFIG||{},Xc={thirdParty:Wc.thirdPartyUrl||"https://3p.bing-amp.net",thirdPartyFrameHost:Wc.thirdPartyFrameHost||"bing-amp.net",thirdPartyFrameRegex:("string"==typeof Wc.thirdPartyFrameRegex?new RegExp(Wc.thirdPartyFrameRegex):Wc.thirdPartyFrameRegex)||/^d-\d+.bing-amp.net$/,cdn:Wc.cdnUrl||"https://www.bing-amp.com",cdnProxyRegex:("string"==typeof Wc.cdnProxyRegex?new RegExp(Wc.cdnProxyRegex):Wc.cdnProxyRegex)||/^https:\/\/([a-zA-Z0-9-]+.)?.bing-amp.com$/,localhostRegex:/^https?:\/\/localhost(:\d+)?$/,

jridgewell commented 5 years ago

Everything appears correct in the config. I'm not able to load the actual https://sports-yahoo-com.bing-amp.com page (I'm assuming it's https://sports-yahoo-com.bing-amp.com/c/s/sports.yahoo.com/amphtml/fantasy-basketball-rookies-who-will-provide-the-most-future-value-211618398.html?) to test.

src-code commented 5 years ago

I believe they’ve taken down Yahoo AMP pages while this issue is open. @ssantosms I’d be okay bringing the pages back while debugging if needed, the broken content is secondary in nature

src-code commented 5 years ago

@jridgewell @ssantosms It looks to me like there is an extra . in the regex that shouldn't be there, before 'bing-amp.com':

/^https://([a-zA-Z0-9_-]+.)?.bing-amp.com$/

There's another dot inside the capture group, so it's expecting two dots in the string it's matching. Also, I assume these should be literal periods, and thus escaped?

erwinmombay commented 5 years ago

@jridgewell do you mean we would require cdnProxyRegex to be explicitly set through AMP_CONFIG?

For me the main question, and this hasn't been answered is, Is AMP_CONFIG actually a required Object for v0.js (or any other main js binary) to function correctly?

originally we had treated it as optional and that v0.js should work without it.

jridgewell commented 5 years ago

Oh, I thought we treated it as a requirement. Does it makes sense to make it required in this case?

rsimha commented 5 years ago

Today, we don't write the AMP_CONFIG to v0.js unless...

  1. --fortesting is passed to gulp dist
  2. gulp prepend-global is explicitly run

https://github.com/ampproject/amphtml/blob/2b8b439203444545acd5c8780ae5ea79ac972b0a/gulpfile.js#L860-L876

I'm not against making the config a requirement. However, which one should we default to? (Tests use prod-config.json as the default.)

erwinmombay commented 5 years ago

yeah, at this point i think it should be a requirement. The runtime won't really run correctly on cache without it.

prod makes more sense to me as default as it is the majority case.

ssantosms commented 5 years ago

@jridgewell @ssantosms It looks to me like there is an extra . in the regex that shouldn't be there, before 'bing-amp.com':

/^https://([a-zA-Z0-9_-]+.)?.bing-amp.com$/

There's another dot inside the capture group, so it's expecting two dots in the string it's matching. Also, I assume these should be literal periods, and thus escaped?

That's a good point. I will validate and fix if needed.

ssantosms commented 5 years ago

@src-code we brought it back to you can take a look. The CORS error is still there, but the regex has been fixed.

Also, I see the same error in the console when I open your page from Google CDN

Access to fetch at 'https://www.yahoo.com/caas/sidekick/sidekick?appid=amp&lang=en-US&region=US&site=sports&__amp_source_origin=https%3A%2F%2Fsports.yahoo.com' from origin 'https://sports-yahoo-com.cdn.ampproject.org' has been blocked by CORS policy: The 'Access-Control-Allow-Origin' header has a value 'https://sports.yahoo.com' that is not equal to the supplied origin. Have the server send the header with a valid value, or, if an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

src-code commented 5 years ago

@ssantosms It seems to still be working properly for me from Google CDN, can you verify the URL you were visiting was:

https://sports-yahoo-com.cdn.ampproject.org/c/s/sports.yahoo.com/amphtml/fantasy-basketball-rookies-who-will-provide-the-most-future-value-211618398.html

I can't seem to fetch the doc from bing.com still though:

https://www.bing.com/amp/s/sports.yahoo.com/amphtml/fantasy-basketball-rookies-who-will-provide-the-most-future-value-211618398.html

This Bing url seems to still redirect to our AMP origin:

https://sports-yahoo-com.bing-amp.com/c/s/sports.yahoo.com/amphtml/fantasy-basketball-rookies-who-will-provide-the-most-future-value-211618398.html

ssantosms commented 5 years ago

@src-code Please try this one https://www.bing.com/amp/s/sports.yahoo.com/amphtml/how-russell-wilsons-monster-deal-impacts-patrick-mahomes-232723776.html

src-code commented 5 years ago

Thanks @ssantosms, the CORS request seems to be working properly now with your fix!

src-code commented 5 years ago

@ssantosms Unrelated, but I see that the ads are failing to render on the above example, seems that the host bing-amp.net can't be found. www.bing-amp.net does resolve however.

https://bing-amp.net/1904021746450/f.js vs. https://www.bing-amp.net/1904021746450/f.js

ssantosms commented 5 years ago

@src-code I noticed that yesterday. Team is looking into it with priority. I will close this bug

ssantosms commented 5 years ago

Actually @rsimha I can't close this bug. Would you mind closing it? It has been fixed on our side.