dnmilne / wikipediaminer

An open source toolkit for mining Wikipedia
130 stars 62 forks source link

Cache labels that occur as links many times, regardless of link probability #13

Open dnmilne opened 10 years ago

dnmilne commented 10 years ago

Currently we only cache labels that have a reasonable probability of being a link.

Uncached labels don't get found by search, compare and annotate services.

This results in some weird omissions, like "Learning"

We should alter the caching process so that it will cache labels that occur as links more than a certain number of times, regardless of link probability.