ampproject / amphtml

The AMP web component framework.
https://amp.dev
Apache License 2.0
14.89k stars 3.89k forks source link

Remove Google domain from AMP pages on the web. #6210

Closed retornam closed 7 years ago

retornam commented 7 years ago

How do we reproduce the issue?

See this https://twitter.com/retornam/status/798662738180354048

STR: A friend sent me this link through WhatsApp, https://www.google.com/amp/m.disclose.tv/amp/news/are_you_suffering_from_trump_acceptance_resistance_disorder_tard/136500?client=safari

This website is clearly a fake news website but due to AMP pages being hidden under the Google.com domain, many people (especially those on mobile browsers) who do not understand AMP and how it works, will assume the site is legitimate.

This is a serious bug in my opinion.

Thanks for all the good work you are doing, I hope you give this issue the attention it deserves.

What browsers are affected?

All browsers

Which AMP version is affected?

Every version.

gloddy commented 7 years ago

In addition, when you send a story, fake or not, it ends up with Google's branding and domain in the message.

img_0818

jpettitt commented 7 years ago

The easy solution would be for google to redirect any external request for a page in an amp viewer to the raw amp page. This would also solve an issue with legit content being shared and appearing to be on the wrong domain.

src-code commented 7 years ago

This really is a Google Search problem, not an AMP issue. Like @jpettitt suggests, Google Search's AMP carousel should be smart enough to redirect to the content page if the carousel only contains a single item, which is the case when sharing a url manually or via the native mobile browser share buttons.

But still, that won't solve the problem of domain masking when sharing urls manually or via the native share buttons from the Google AMP carousel, which seems to be the greater issue here.

src-code commented 7 years ago

It'd be interesting to know if the AMP caches plan to reject certain fake news publishers much the same way Google Search will bury them in the ranking, or if it will cache all content regardless of publisher quality. Since AMP by nature is masking the publisher's domain because of the cache, it becomes harder for apps that want to build on top of the AMP cache (like Google Search's AMP Carousel) or share AMP links to know whether the content is trustworthy without some other external ranking signal. And even then, it's necessary to deconstruct the AMP CDN or viewer url to know who the publisher is in the first place. (Eg, how will Facebook or other sites block fake news that's cached on the AMP CDN or proxied through the Google Search AMP carousel viewer unless it understands specifically how to recognize these urls?)

jridgewell commented 7 years ago

/to @rudygalfi and @ericlindley-g

wycats commented 7 years ago

This really is a Google Search problem, not an AMP issue.

AMP is a Google search technology. The only way to understand AMP is to understand its relationship to Google Search.

jpettitt commented 7 years ago

It's actually an amp-viewer best practices problem. As more people implement viewers in other contexts this issue will rapidly become messy. If Google were to promote a best practice of "unframing" bookmarked amp pages and it were added to the mythical viewer documentation it would help.

cramforce commented 7 years ago

@wycats AMP is much larger than Google Search.

It is launched in

While Google Search is an important part of the AMP ecosystem, we need to address the issues across the board.

There are a few issues here that will need to be addressed separately:

cramforce commented 7 years ago

I will leave this issue open for a bit, since it is important to have a discussion about the recommendations for the wider AMP ecosystem. But lets keep the discussion focused on that. There is a specific place to talk about Google specific AMP product features over here: https://goo.gl/utQ1KZ

jcanizales commented 7 years ago

Naive question: How hard would it be to use a domain other than www.google.com for the Google AMP Cache? A name that wouldn't make anybody think that the content is produced or endorsed by Google.

If instead of:

https://www.google.com/amp/m.disclose.tv/amp/news/are_you_suffering_from_trump_acceptance_resistance_disorder_tard/136500?client=safari

the URL were:

https://www.ampwebcache.com/m.disclose.tv/amp/news/are_you_suffering_from_trump_acceptance_resistance_disorder_tard/136500?client=safari

this would be a non-issue.

cramforce commented 7 years ago

@jcanizales AMP pre-rendering works by loading content in an iframe (in the background) and using history.pushState to navigate there. That means the origin cannot be changed.

The content itself is not hosted on google.com anyway, it is only the limitations of pushState that prohibit changing the URL. Note, that this is only the case on the web. Google can definitely just act like the current URL is on the publisher origin in its native apps on Android and iOS.

jcanizales commented 7 years ago

Thanks for the explanation! All the comments above make much more sense now :)

gloddy commented 7 years ago

@cramforce "The Google Search viewer clearly attributes the original domain at the top." The most common advice to avoid phishing and scams is “check the domain in the address bar.” Not at the text that might be below the address bar. (And in the smallest type on the page, I might add.)

Even moving the hosted content to a domain that doesn't say "google.com" would be helpful. As would some clearer indicator of where you really are at the top.

cramforce commented 7 years ago

See above for why the domain displayed cannot change. The actual served content is from cdn.ampproject.org. I've so far not heard of a case where a user was confused about the origin of content.

On Tue, Nov 29, 2016 at 9:25 AM, Christian Gloddy notifications@github.com wrote:

@cramforce https://github.com/cramforce "The Google Search viewer clearly attributes the original domain at the top." The most common advice to avoid phishing and scams is “check the domain in the address bar.” Not at the text that might be below the address bar. (And in the smallest type on the page, I might add.)

Even moving the hosted content to a domain that doesn't say "google.com" would be helpful. As would some clearer indicator of where you really are at the top.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ampproject/amphtml/issues/6210#issuecomment-263637831, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFeT85z7GOqEjrG0ivcpuHxcL6iYGJMks5rDGAhgaJpZM4KzKKZ .

jcanizales commented 7 years ago

I'm not sure if you're discounting the very tweet that started this thread or not:

I can't convince person who sent this that the site is fake. Why? The person replies it is on Google, it can't be fake

tlrobinson commented 7 years ago

Note, that this is only the case on the web. Google can definitely just act like the current URL is on the publisher origin in its native apps on Android and iOS.

This is the main issue I have with AMP. Why not work with browsers to make this work in web browsers too? Hijacking the domain is pretty obnoxious even if there are technical reasons for it.

cramforce commented 7 years ago

Oh, absolutely! We are working with browsers on alternatives. Doesn't look good right now. Chrome is removing prerender support which would have been an alternative and Safari never supported it.

On Tue, Nov 29, 2016 at 10:01 AM, Tom Robinson notifications@github.com wrote:

Note, that this is only the case on the web. Google can definitely just act like the current URL is on the publisher origin in its native apps on Android and iOS.

This is the main issue I have with AMP. Why not work with browsers to make this work in web browsers too? Hijacking the domain is pretty obnoxious even if there are technical reasons for it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ampproject/amphtml/issues/6210#issuecomment-263648097, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFeT7h-6UnMTiIUjXs0Y9xlAA6HlCbVks5rDGiJgaJpZM4KzKKZ .

gloddy commented 7 years ago

@cramforce When looking at fake news via AMP, the url in the address bar is google.com As I stated before, the most common advice to avoid phishing and scams is “check the domain in the address bar.” What should we tell people now?

cramforce commented 7 years ago

There needs to be application specific advice aligned with how apps like Apple News for Facebook are solving this.

On Tue, Nov 29, 2016 at 10:03 AM, Christian Gloddy <notifications@github.com

wrote:

@cramforce https://github.com/cramforce When looking at fake news via AMP, the url in the address bar is google.com As I stated before, the most common advice to avoid phishing and scams is “check the domain in the address bar.” What should we tell people now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ampproject/amphtml/issues/6210#issuecomment-263648582, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFeT4LTIr2_Elgd-UIOavSLP_LCc6Odks5rDGj-gaJpZM4KzKKZ .

jcanizales commented 7 years ago

Another naive question: What's the performance penalty of not using history.pushState, assuming the content has already been preloaded in the hidden iframe so it's in the browser cache?

cramforce commented 7 years ago

Such a mechanism also doesn't support the swiping UI. 100s of ms.

On Tue, Nov 29, 2016 at 10:07 AM, Jorge Canizales notifications@github.com wrote:

Another naive question: What's the performance penalty of not using pushState, assuming the content has already been preloaded in the hidden iframe so it's in the browser cache?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ampproject/amphtml/issues/6210#issuecomment-263649753, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFeT_FXvrfp5Jhb5rck9hlmRADQ4ijsks5rDGn4gaJpZM4KzKKZ .

jcanizales commented 7 years ago

Yeah the swiping is cool.

What's the browser doing in those 100s of ms? You'd think that at worst it only has to render again, and it could in theory reuse the rendering work it did in the iframe. Unless there's something really fundamental that can't be done before the user's click, optimizing the browser in this respect could be a solution, no?

jridgewell commented 7 years ago

You'd think that at worst it only has to render again

Don't discount how long this takes. The main feature of AMP is instant page loading.

it could in theory reuse the rendering work it did in the iframe

See https://github.com/ampproject/amphtml/issues/6210#issuecomment-263648416.

cramforce commented 7 years ago

We're definitely looking at getting render time down, but nothing ever beats 0 except 0 :)

On Tue, Nov 29, 2016 at 10:55 AM, Justin Ridgewell <notifications@github.com

wrote:

You'd think that at worst it only has to render again

Don't discount how long this takes. The main feature of AMP is instant page loading.

it could in theory reuse the rendering work it did in the iframe

See #6210 (comment) https://github.com/ampproject/amphtml/issues/6210#issuecomment-263648416 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ampproject/amphtml/issues/6210#issuecomment-263663122, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFeTzaNLkYpy9hzk4ZuA0ywYIrKx6gGks5rDHUggaJpZM4KzKKZ .

gloddy commented 7 years ago

@cramforce "I've so far not heard of a case where a user was confused about the origin of content." You don't think this is contributing to the "fake news" problem?

jcanizales commented 7 years ago

@gloddy, I think @cramforce just hadn't read the tweet reply to the linked tweet (the one I quoted in https://github.com/ampproject/amphtml/issues/6210#issuecomment-263647293).

I think all here agree that the sentiment "it is on Google, it can't be fake" is a good thing to preserve :) Apart from the obvious problems of fake news, this is also a menace for that.

cramforce commented 7 years ago

@gloddy I don't think it is contributing in any significant way, no. But I do think it can be part of the solution. Users confused by the (bad) display in iMessage would be just as confused by https://goo.gl/73ybWd

jrf0110 commented 7 years ago

The easy solution would be for google to redirect any external request for a page in an amp viewer to the raw amp page. This would also solve an issue with legit content being shared and appearing to be on the wrong domain.

For what it's worth, iMessage doesn't seem to follow redirects at all. And even if it were redirected, I think the risk of the content being attributed to Google is still there (since they clicked a link to the google domain). If anything, I think the risk for misattribution is less for the case of not redirecting (as there is an extra piece of chome that states the host of amp the document).

Ideologically, I like the redirect. It gets the user to where they actually wanted to go. But it doesn't really address the reason why this issue was opened. I think this github issue fundamentally disagrees with trade-offs the viewer makes in the name of UX; Namely, super fast loading and carousel swiping.

gloddy commented 7 years ago

@cramforce It's not about the redirect. It's about Google's domain sitting on top of content like this:

img_0955

"Check the domain in the address bar" has been solid and much shared advice for years. That is until I saw this.

cramforce commented 7 years ago

@gloddy I follow your argument. But I disagree on the impact. I would, of course, prefer if the Google.com was not there. (Coming with https://github.com/bokand/NonDocumentRootScroller)

If there is an issue here, things could be done, such as having a larger initial banner at the top that animates away after a few seconds. CC @rannazhou

jpettitt commented 7 years ago

I think this issue is going to bite Google when least expected and in a very public and negative way. Google's credibility is on the line. Maybe iMessage doesn't follow redirects, that's a spurious argument (some people still die of lung cancer so we shouldn't cure any cancer). If it works in most environments, particularly web browsers then we can open a bug with Apple to fix their $%^& (not that they will).

cramforce commented 7 years ago

@jpettitt iMessage currently follows redirects to get the title of the destination, but displays the origin of the initial URL.

gloddy commented 7 years ago

@cramforce I'll post here what I wrote to the search team and leave it at that. It's a legitimate issue and I hope you'll consider it some more. I rarely involve myself in such debates on github, but I genuinely believe there has to be a better solution here.

Thanks for reaching out. Everyone at Google has always been responsive and helpful, Malte included. I appreciate it.

I’d like to say upfront that I don’t believe anyone’s acting in bad faith. Simply that well-thought plans have had unintended consequences. That’s the nature of our work I suppose.

AMP’s primary goal, to the make the web fast, is one I share. Pages need to load quickly. A writer who sweated out an article should reach the people who want to read it as quickly as possible. Any delay in delivery is a disservice to both. Hosting those pages on your own CDN removes a real piece of uncertainty in that delivery. But that choice has unintended consequences and leaves Google lending it’s reputation to anyone who can publish in the AMP format, including “fake news.“

The most common advice to avoid phishing and scams is “check the domain in the address bar.” Google’s implementation of AMP makes the illegitimate (and often dangerous) look real. The headlines on these “fake news“ sites are awful. I won’t bother to repeat them here as I’m sure you’re well aware. Google's domain accompanied by a very visible green security lock lends a clear sign of credibility to this misleading and often violent information.

I recognize that you’ve included the original domain in the bar beneath, but not only is it usually the smallest text on most pages, it is quite simply not where we’ve told people to check for validity over many, many years.

iMessage sharing is simply a side effect. You may be able to get Apple to alter iMessage to following the canonical URL, but you certainly won’t be able to change every piece of software that shares links. And this won’t solve the problem of some arriving at these pages via a nearly endless list of other paths.

Malte at one point said: “@gloddy taking this super seriously. Fake news is the key problem of this emerging era.”

I would say that it's emerged. It's here. Or more accurately, it’s effect was made clear on November 8th. Google's verified domain lending validity to these tactics is a painful thing to see. Please reconsider this particular aspect of your implementation. It would be better for us all, Google included.

Thanks for you time and I genuinely appreciate the dialogue.

Christian

tlrobinson commented 7 years ago

That's a good point, is Google opening themselves up to phishing attacks?

simevidas commented 7 years ago

Google has announced that cached AMP pages will soon (Q1 2017) be served from subdomains that resemble the publisher’s own domain (and Google Search will start using these new URLs as well):

// before
https://cdn.ampproject.org/c/nytimes.com/…/alzheimers-photos-into-oblivion.amp.html

// after
https://nytimes-com.cdn.ampproject.org/…/alzheimers-photos-into-oblivion.amp.html

Will that help with this issue?

jcanizales commented 7 years ago

I guess it protects against phishing. It doesn't seem to affect URLs like this https://www.google.com/amp/m.disclose.tv/amp/news/are_you_suffering_from_trump_acceptance_resistance_disorder_tard/136500?client=safari , but this one is already being redirected. So I guess the only remaining thing would be for iMessage to follow the redirect for the URL too?

simevidas commented 7 years ago

@jcanizales google.com/amp/ URLs are used for articles displayed inside the AMP news carousel on Google Search; in any other environment, they’re redirected. (I think that’s how it works.) Anyway, the new URLs are planned for Q1 2017 and Google Search will use them too, so google.com/amp/ is (most likely) going away.

jrf0110 commented 7 years ago

Google Search will use them too, so google.com/amp/ is (most likely) going away.

I don't thiiiink the google.com/amp/ scheme will go away. The changes are to AMP cache, which the Google search AMP document viewer uses to display documents embedded into search results. They're not going to be able to pushState a totally different hostname. The cache changes will just affect the src attribute of the iframe the viewer uses.

jpettitt commented 7 years ago

and the blowback starts http://www.theverge.com/2016/12/6/13850230/fake-news-sites-google-search-facebook-instant-articles

callionica commented 7 years ago

I think the argument that the banner is a reasonable way for users to identify the source of the page is bizarre. Imagine that you actually manage to change user expectations enough that they actually start to rely on the banner instead of the address bar: congratulations you've trained users to be phished, reversing years of security training. That would be a bad result. If I create a page at my-phishing-site.com with a lookalike banner saying google.com, do you really want people to believe the content comes from Google?

callionica commented 7 years ago

Also URL shorteners are terrible for security, a bad experience for users, and make the web environment worse for publishers. I think Google can do better than that with AMP.

cacarr-pdxweb commented 7 years ago

"Also URL shorteners are terrible for security, a bad experience for users, and make the web environment worse for publishers. I think Google can do better than that with AMP."

And, I think many fairly unsophisticated users still know what a URL shortener is. This: https://goo.gl/NI7S9e doesn't cause the same sort of confusion as this: google.com

ghost commented 7 years ago

AMP link remover in progress!

screenshot_2017-01-26-12-15-33

epheterson commented 7 years ago

Got duped here with (https://github.com/ampproject/amphtml/issues/7251), can't wait for the fix. I like the feature, but sharing amp links isn't the best method, I want to share the original and it's very difficult to do with the current implementation.

jridgewell commented 7 years ago

@epheterson: We'll be providing a link with the canonical URL, which is what people would hopefully share.

jpettitt commented 7 years ago

@jridgewell do you really mean the canonical or source URL that pointed to the AMP doc - they are often different per discussion in #7058

cramforce commented 7 years ago

The Search team is going with canonical. This has been what desktop and bot requests to the URL have been redirecting to for a while.

Override with <link rel=sharelink> is definitely something I'd support, though.

ghost commented 7 years ago

AMP link remover finished. It's a temporary workaround.

https://evildog1.github.io/ampremover.html

image

cramforce commented 7 years ago

https://developers.googleblog.com/2017/02/whats-in-amp-url.html how access to canonical is provided now.

I'm also talking to browser folks about how we could get the origin to be as desired.

ithinkihaveacat commented 7 years ago

The iOS Google Search App seems to be sharing the origin's AMP URL, not the canonical:

img_0761