Closed indrajithi closed 2 months ago
respect_robots_txt
True
urllib.robotparser
crawl-delay
@indrajithi I started working on this already, assign me when you can :)
respect_robots_txt
(Default should beTrue
because of legal obligation in some jurisdictions)urllib.robotparser
will helps in parsing robots.txt)crawl-delay
if present. Check the rule before crawling a path)