ampproject / amphtml

The AMP web component framework.
https://amp.dev
Apache License 2.0
14.89k stars 3.89k forks source link

Specify how AMP pages are supposed to be referenced/linked from the their canonical pages #498

Closed jamesreggio closed 8 years ago

jamesreggio commented 9 years ago

It's possible that I've just missed this detail in the sea of documentation, but it seems to me that there's no clear way to signal the existence of an AMP page from a canonical/ordinary webpage.

The <link rel="canonical"> tag exists for an AMP page to reference its canonical page. How should the canonical page reference its AMP page?

This seems like an important feature for search-engine discovery and optimistic redirection to AMP pages on mobile browsers. I would expect a tag like <link rel="alternate" type="application/amp+html"> or similar to do the trick (though I'm not sure what the AMP MIME type is, if one has been established).

jmadler commented 9 years ago

Good point!

FWIW, the MIME type for AMP HTML documents is the same as all other HTML documents: text/html

Gregable commented 9 years ago

There is a mechanism: <link rel="amphtml" href="{amp version}">

I don't see it documented though.

dvoytenko commented 9 years ago

Here's this in many examples, but I don't see a MD for it. We should add.

        The canonical document for this article should be linked, as above.

        The canonical document should also have a corresponding <link> tag
        within pointing at this AMP HTML file: 

          <link rel="amphtml" href="http://example.ampproject.org/article-metadata.amp.html" /> 

        It is possible that this AMP HTML document is the canonical document
        for this article, in which case, the canonical URL should point to this
        document, and no "amphtml" link is required.
jamesreggio commented 9 years ago

Ah, excellent. That answers my question.

I'll leave this issue open as a reminder that the documentation can/should be improved.

dvoytenko commented 9 years ago

Yes, please keep open. We will fix.

Meggin commented 9 years ago

Jordan, do you want to re-assign this one to me? I can make the docs better.

cramforce commented 9 years ago

Lets fix the spec first. Then the docs.

kevinmarks commented 9 years ago

rel="alternate" is more accurate; rel="alternate" type="text/html" media="handheld" is a noted example

http://microformats.org/wiki/rel-alternate#With_media

if you do want to create rel="amphtml" please use the registry here:

http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions

cramforce commented 9 years ago

I'm kind of on the fence here. We'd been using amphtml but it is not impossible to change.

How would the rel=alternate look examle

type="text/html+amp" ?

Don't really want to introduce a new mime type :)

adactio commented 9 years ago

Rel values can be combined (space separated) so how about allowing both:

link rel="amphtml" href="/path/to/amp.html"

and

link rel="alternate amphtml" href="/path/to/amp.html"
veganstraightedge commented 9 years ago

:+1: @adactio's suggestion.

jamesreggio commented 9 years ago

I'm actually somewhat opposed to the multiple rels approach recommended by @adactio, if only because I'd imagine a lot of parsers are not written to handle that case well. (I've never seen multiple rels used before in the wild.)

cramforce commented 9 years ago

I wonder if there is any concrete benefit of rel=alternate? It seems any application that would prefer AMP HTML would need to have a clear notion of AMP anyway.

adactio commented 9 years ago

@cramforce It’s more for the other way around: aggregators gathering multiple alternates e.g. a JSON version, an RDF version, an AMP version.

@jamesreggio You have led a very sheltered existence. :-) Space separated rel values are very much the norm, and anyone writing a parser is aware of that.

cramforce commented 9 years ago

Filed internal bug at Google to check on whether we support the space separated version for discovering AMP documents already.

Tagged this bug "discovery" for everyone else following along on spec changes related to finding AMP documents.

julien51 commented 9 years ago

@jamesreggio Agreed with @adactio this is atcually pretty common in the RSS world...

julien51 commented 9 years ago

And talking about RSS, it would be nice if one of the recommendations was to also include this <atom:link rel="amphtml" ... /> into the feed's entries (RSS or Atom) themselves to avoid fetching both the HTML and the AMP when trying to poll resources from a feed.

Something like this:

<item>
    <title>AMPed up</title>
    <link>https://adactio.com/journal/9646</link>
    <atom:link href="https://adactio.com/journal/9646/amp" rel="amphtml" />
    <description>
        <![CDATA[
        ...
        ]]>
    </description>
    <pubDate>Sat, 10 Oct 2015 15:02:39 GMT</pubDate>
    <guid>https://adactio.com/journal/9646</guid>
</item>
Gregable commented 9 years ago

I don't know if we will accept the space separated version on a non-amp page referring to an amp page, but we don't validate space separated rel values in an AMP document currently, and we should.

joshcp commented 9 years ago

From the perspective of parsing the canonical page to find an AMP version, would a <link> inserted via JavaScript be acceptable?

cramforce commented 9 years ago

@joshcp Very good question! Pinging @amplesample

julien51 commented 9 years ago

Hum. My hunch would be that it would be quite ambitious to expect everyone consuming HTML pages to be able to execute Javascript to identify an AMP version of the document. After all, even today, Google isn't 100% able to execute javascript from all pages it crawls.

joshcp commented 9 years ago

@julien51 my expectation wouldn't be that everyone consuming HTML pages would execute JavaScript. The HTML page is canonical, so if a visitor doesn't execute JS they still get served the content. I see AMP as an enhancement of user experience, not core to the user experience.

The question is more with regard to Google's parser in particular, and generally speaking how future implementations of AMP consuming apps are expected to work.

It seems like there's an expectation that apps displaying AMP pages will be able to execute JavaScript. Also, Google is able to detect JavaScript-adaptive mobile website configurations, so I would hope a JS-inserted AMP <link> would be detected by Google and other future AMP parsers.

julien51 commented 9 years ago

Well, for what it's worth I run an API which aims at being able to consume AMP pages but is completely unable to execute JavaScript. Putting JS as a way to detect/identify AMP pages would completely block us from doing so.

I'm not 100% sure what the benefit is to allow JS to specify AMP resources, compared to other commonly used methods such as <link> elements and Link headers for example.

cramforce commented 9 years ago

Pretty much agree with everyone here. Lets just decide that JS injection is not OK. We want AMP to be usable by lots of platforms and while Google might be able to handle JS not everyone can.

joshcp commented 9 years ago

@julien51 according to the link you shared, your app would still be able to consume the content via HTML. If a <link> is inserted into the HTML via JavaScript, that is an enhancement to the experience made available to parsers that execute JavaScript.

@cramforce For what it's worth, I run a Javascript-adaptive mobile platform that would like to be able to inject a <link> to an external AMP page on the canonical page. Why exclude us from being able to generate AMP versions for those parsers that can handle it? And for parsers that can't, there's still the HTML fallback.

cramforce commented 9 years ago

@joshcp I would not say that crawlers should ignore JS based injection of the meta tag, but I would not mandate support for it. Do you use fragment URLs or actual URLs (based on pushState) for your permalinks?

joshcp commented 9 years ago

@cramforce our platform generates mobile views of existing websites, using a small JS snippet to reconfigure the page for mobile (for example, load only mobile-specific images and CSS). We're not generating URLs with JS, just re-configuring the content at any given URL. The plaform works with both actual and fragment URLs.

joshcp commented 9 years ago

@cramforce to clarify, are you saying that JS injection of the meta tag is OK, and crawlers that support JS should recognize that there's a <link>? Acknowledging that not all crawlers/parsers support JS?

cramforce commented 9 years ago

@joshcp We're happy AMP pages to be discovered any way they can. But I would not expect the JS solution for work with most AMP platforms.

Gregable commented 8 years ago

The validator now supports whitelist separated values in the link tag's rel attribute.

adactio commented 8 years ago

:+1:

niutech commented 7 years ago

What I have recently suggested on the AMP HTML Discuss forum is to add the Link: <....>; rel="amphtml" response header alongside the existing <link href="..." rel="amphtml"> tag, so that a user agent does not have to download the whole response body in order to redirect to the AMP version of a web page. Please consider adding it to the spec.