It makes sense to assing more weight to recent samples in DupePredictor; together with https://github.com/TeamHG-Memex/undercrawler/issues/41 it should allow to handle a case when crawler first visits a large part A of website, learns a pattern, then it goes to another part B of a website where this pattern is no longer valid.
It makes sense to assing more weight to recent samples in DupePredictor; together with https://github.com/TeamHG-Memex/undercrawler/issues/41 it should allow to handle a case when crawler first visits a large part A of website, learns a pattern, then it goes to another part B of a website where this pattern is no longer valid.