Remove the google block. (2 people have asked me what's recently happened to ropewiki's google results as they've gone to crap, whereas duckduckgo is still good - this explains why).
Move Yandex & Semrush blocks from IP to more robust useragent block
Seekport is a nice bot and follows robots.txt since they were added there.
Amazon is already blocked by useragent so no need for IP block
The MS/Bing bots have switches to different IPs and already bypassed the IP-block and crawling the site. I also thing allowing Bing to crawl is an ok idea - weirdly people do actually use it.
Also adds the crawl-delay option to tell good bots to back-off a bit (one request per 5 seconds).
Deployed locally and confirmed all services start up correctly.
This cleans up a few bot related things.
The MS/Bing bots have switches to different IPs and already bypassed the IP-block and crawling the site. I also thing allowing Bing to crawl is an ok idea - weirdly people do actually use it.
Also adds the
crawl-delay
option to tell good bots to back-off a bit (one request per 5 seconds).Deployed locally and confirmed all services start up correctly.