confact / Spider.cr

Spider.cr is a spider crawler in Crystal. It handles collecting, scraping, and parsing. So you can spend your time collecting the data you want on a big scale.
MIT License
9 stars 0 forks source link

Robots.txt support #1

Open confact opened 2 years ago

confact commented 2 years ago

We need to respect the website's robots.txt files.

It should stop going through URLs that are denied by the robots.txt rules.

grkek commented 2 years ago

Only when there is a need to, respecting the robots.txt directly would be a bummer if the user wants to access it anyway.

confact commented 2 years ago

@grkek It will be possible to turn it off. But it should probably be enabled as default. We want to encourage good behavior but still let people do what they want :)