Closed adactio closed 8 years ago
@powdercloud I think we should allow arbitrary rel values on a
. Did we only whitelist some so far or allow none? nofollow
and noreferrer
are also needed there.
We whitelisted rel= values. We had a discussion about this offline. The general issue is that some rel= values trigger client behavior that we don't want AMP pages to have. The following values were of particular concern: import, components, prerender, subresource, preconnect, prefetch, preload. Since there isn't a clear standardized list of these and different clients may have different supported values or may add some in the future, we went with a whitelist approach. I manually went through and tried to be as exhaustive as possible with whitelisting things though. Here's what we have for link:
"accessibility|"
"alternate|"
"archives?|"
"attachment|"
"appendix|"
"author|"
"bibliography|"
"category|"
"chapter|"
"cite|"
"comment|"
"contents|"
"contribution|"
"copyright|"
"directory|"
"discussion|"
"DCTERMS\\..*|" # Dublin Core Metadata
"EditURI|"
"endorsed|"
"fan|"
"feed|"
"footnote|"
"glossary|"
"group|"
"help|"
"home|"
"hub|"
"in-reply-to|"
"index|"
"indieauth|"
"issues|"
"its-rules|"
"jslicense|"
"license|"
"made|"
"map|"
"me|"
"meta|"
"micropub|"
"microsummary|"
"next|"
"nofollow|"
"noreferrer|"
"openid2?\\..*|" # OpenID Authentication
"p3pv1|"
"payment|"
"pingback|"
"pgpkey|"
"privacy|"
"pronounciation|"
"profile|"
"publisher|"
"prev|"
"previous|"
"referral|"
"related|"
"replies|"
"reply-to|"
"search|"
"section|"
"service|"
"shortlink|"
"source|"
"sidebar|"
"sitemap|"
"sponsor|"
"start|"
"status|"
"syndication|"
"subsection|"
"timesheet|"
"toc|"
"token_endpoint|"
"trackback|"
"transformation|"
"unendorsed|"
"user|"
"vcalendar-parent|"
"vcalendar-sibling|"
"webmention|"
"yandex-tableau-widget"
Along with special rules for stylesheet, amphtml, and canonical.
The a
tag is similar with support for:
"accessibility|"
"author|"
"alternate|"
"appendix|"
"alternate|"
"archived|"
"archives?|"
"attachment|"
"appendix|"
"author|"
"bookmark|"
"bibliography|"
"category|"
"chapter|"
"cite|"
"colleague|"
"comment|"
"contact|"
"contents|"
"contribution|"
"copyright|"
"directory|"
"discussion|"
"EditURI|"
"endorsed|"
"fan|"
"feed|"
"footnote|"
"glossary|"
"group|"
"help|"
"home|"
"hub|"
"in-reply-to|"
"index|"
"indieauth|"
"issues|"
"its-rules|"
"jslicense|"
"license|"
"made|"
"map|"
"me|"
"meta|"
"micropub|"
"microsummary|"
"next|"
"nofollow|"
"noreferrer|"
"openid2?\\..*|" # OpenID Authentication
"p3pv1|"
"payment|"
"pingback|"
"pgpkey|"
"privacy|"
"pronounciation|"
"profile|"
"publisher|"
"prev|"
"previous|"
"referral|"
"related|"
"replies|"
"reply-to|"
"search|"
"section|"
"service|"
"shortlink|"
"source|"
"sidebar|"
"sitemap|"
"sponsor|"
"start|"
"status|"
"syndication|"
"subsection|"
"tag|"
"timesheet|"
"toc|"
"token_endpoint|"
"trackback|"
"transformation|"
"unendorsed|"
"user|"
"vcalendar-parent|"
"vcalendar-sibling|"
"webmention|"
"yandex-tableau-widget"
These lists were generated by looking at http://microformats.org/wiki/existing-rel-values
Canonical isn't allowed for a
since that resource mentions that it isn't allowed for a
.
This is kind of an ugly part of the validator because it's a value set that's not well standardized but also has potential behavioral affects in clients. I'd strongly prefer not to go the blacklist route as a result.
Also, as I mentioned in https://github.com/ampproject/amphtml/issues/498, we don't currently allow whitespace separation, which is something we need to fix, especially for a
tags.
Canonical isn't allowed for a since that resource mentions that it isn't allowed for a.
That's definitely an error in the registry documentation. Sorry about that. I'll get that changed.
Ah, cool. @cramforce , this might need some more thought for the amphtml spec. Currently the spec requires a <link rel=canonical>
tag in the <head>
somewhere. Would we accept an a
tag instead? Do the hrefs need to be the same, etc.
Out of curiosity, what does it mean for an a
tag to have rel=canonical? Is it something like:
yadda yadda <a href='/foo.html' rel='canonical'>canonical page here</a> yadda
I'm unsure if search engines support this.
Unless we get good signal from Bing, Yandex, Google, etc. that <a href='/foo.html' rel='canonical'>
is supported well I think keeping the link tag requirement is important.
@cramforce You can totally keep the link
element requirement, but all I'm asking that is that the rel
value of "canonical" be allowed on a
elements.
Right, yeah, we will!
On Tue, Oct 13, 2015 at 1:32 PM Jeremy Keith notifications@github.com wrote:
@cramforce https://github.com/cramforce You can totally keep the link element requirement, but all I'm asking that is that the rel value of "canonical" be allowed on a elements.
— Reply to this email directly or view it on GitHub https://github.com/ampproject/amphtml/issues/596#issuecomment-147843795.
The rel
registry has been updated:
http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions
Apologies once again for the confusion.
Search engines and browsers tend to only act on <link>
elements in the <head>
because one needs a reasonable amount of control the website to put anything there. Otherwise user-generated content can get up to a lot of mischief: canonical, prerender, a different RSS feed/OpenSearch, etc.
I know of exactly one crawler that bothers to look at rel
values on <a>
, and that's Piperka. The rest didn't want to start screwing over websites that didn't implement any rel
filtering before it became important.
rel=canonical is now allowed on a
as well as link
tags. I don't have rights to close this, so, @cramforce to do the honors.
I'd like to re-open this issue please. I don't know if it's a regression or if this slipped through the cracks, but the AMP validator is currently flagging up rel="canonical" on A elements as errors.
The attribute 'rel' in tag 'a' is set to the invalid value 'canonical'.
Example: https://adactio.com/journal/9646/amp#development=1
So this is the experience on Android: https://twitter.com/m_ott/status/749649802036666368
It's like XHTML2 all over again.
/to @Gregable to check for regression
Lets fix.
@adewale Could you talk to Medium that they keep their links up-to-date. For their type of linking it might be better if they have the cache emit a 302 when the doc is invalid. I think we should look at building that to avoid the XHTML2 mess.
Done.
Looks like this regressed in March when we switched rel= values from whitelists to blacklists. Will fix and add tests.
Fixed, will roll out to cdn.ammproject.org soon.
Should be live on cdn.ammproject.org.
Currently the AMP documentation requires a
link rel="canonical"
in the head of the document. But there are two linking mechanisms that can acceptrel
values:link
anda
.How about allowing
rel="canonical"
to sit ona
elements as well?Here's an example of an AMP HTML document with both: https://adactio.com/journal/9646/amp
That page currently fails validation because it has
rel="canonical"
on ana
element that links back to the regular HTML version.