TeamHG-Memex / undercrawler

A generic crawler
78 stars 25 forks source link

DupePredictor should assign more weight for recent samples #42

Closed kmike closed 8 years ago

kmike commented 8 years ago

It makes sense to assing more weight to recent samples in DupePredictor; together with https://github.com/TeamHG-Memex/undercrawler/issues/41 it should allow to handle a case when crawler first visits a large part A of website, learns a pattern, then it goes to another part B of a website where this pattern is no longer valid.

lopuhin commented 8 years ago

Moved here https://github.com/TeamHG-Memex/MaybeDont/issues/1