commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

Parse and use DMOZ titles/summaries #12

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

We current download the DMOZ data but we only store a boolean signal for the presence of URLs or domains in their dumps.

We should start storing titles and descriptions, and then use them as fallbacks in the search results. An example where this would help is commonsearch/cosr-results#3

We should also add support for <META NAME="ROBOTS" CONTENT="NOODP"> as explaned here: http://sitemaps.blogspot.com/2006/07/more-control-over-page-snippets.html

A few pointers:

sylvinus commented 8 years ago

Since #26 we store title + description in the URLServer so this got much easier!