commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

Index link text #7

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

Link text is a powerful signal for relevance.

Current code can already extract the text. The main issue is that it's an external factor to the page and has to be determined (inverted) before we index the page if we want to keep a single indexing pass.

A couple options I see:

The good news is that unlike PageRank it doesn't need to be a graph operation. We should be fine for now (or ever) with 1 level of transmission.