dakrone / itsy

A threaded web-spider written in Clojure
181 stars 30 forks source link

Obeying robots.txt? #9

Open giorgio79 opened 10 years ago

giorgio79 commented 10 years ago

Any plans for robots.txt compliance?

dakrone commented 10 years ago

Itsy already looks at robots.txt, see: https://github.com/dakrone/itsy/blob/master/src/itsy/core.clj#L133

giorgio79 commented 10 years ago

Nice. Do you plan on implementing crawl-delay too? http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive Or just a general delay setting?

dakrone commented 10 years ago

I think supporting the crawl-delay parameter would be the best, I'll add it to the todo.