commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

How should canonical URLs be validated? #14

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

Currently we force canonical URLs declared in the meta tags to be on the same domain as the base document: https://github.com/commonsearch/cosr-back/blob/master/cosrlib/document/html/htmldocument.py#L346

Is this requirement too strict? If we relax it (same root domain? same DNS owner? any domain?), would some abuse/impersonation be possible?

indolering commented 8 years ago

You can't restrict based on domain because rel="canonical" is used for attribution across sites. For example, I regularly post things on my own personal blog and my employer's blog. They don't pay me for blog posts, so I have them insert a rel="canonical" link.