DKarap / web-driver

crawler that use the webdriver, ghostdriver/phantomJS
0 stars 1 forks source link

Add stop-anchor lists in order to filter out uselled links #19

Closed DKarap closed 10 years ago

DKarap commented 10 years ago

i.e: http://lemurproject.org/clueweb09/anchortext-querylog/TF-Build-Anchor-Log/data/clue-stop-anchor-contain.txt

http://lemurproject.org/clueweb09/anchortext-querylog/TF-Build-Anchor-Log/data/clue-stop-anchor-whole.txt

DKarap commented 10 years ago

this can be done on top of web-driver, such as with the web-crawler