BruceDone / awesome-crawler

A collection of awesome web crawler,spider in different languages
MIT License
6.33k stars 685 forks source link

Add Mercator URL Frontier implementation #61

Closed moredure closed 3 years ago

moredure commented 3 years ago

@BruceDone checkout https://github.com/BruceDone/awesome-crawler/pull/62 please, added reference to https://dev.to/spacenomad/mercator-url-frontier-in-golang-15o2

BruceDone commented 3 years ago

Is this a framework ? @moredure

moredure commented 3 years ago

Hello @BruceDone, I have some ideas to wrap it into a framework, but generally it is just a reference implementation of Mercator Crawler URL Frontier in Golang, which allows to conduct smart (calculate windows between requests to the same host based on previous request duration to the same host or some static values, etc) and polite crawling as well as to enable crawling without memory limitations for URL queues both seed and collected during crawling.

moredure commented 3 years ago

Should it be a complete framework to pass your moderation @BruceDone , or just a useful reference will be enought?

BruceDone commented 3 years ago

Thanks for your contribute, but this repo just collect the crawler framework

moredure commented 3 years ago

Understood, thanks for your reply.