commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

Support the Robots meta tag #18

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

http://www.robotstxt.org/meta.html

Not sure if Common Crawl already filters those pages, but we should do it on our side too anyway.

Some pointers:

sylvinus commented 8 years ago

We should also support the x-robots-tag http header: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag#using-the-robots-meta-tag

sylvinus commented 7 years ago

@jhildreth started a branch with some good work that should be merged: https://github.com/commonsearch/cosr-back/compare/master...jhildreth:feature/robots-tag