Closed TechnologyClassroom closed 2 weeks ago
The user must have configured it to crawl your web server every 1 second. AHC is an HTTP client library and it's clearly up to the user how they intend to use it. Also, there are no plans to support robots.txt
at the moment.
I've recently found this project while reading server logs. Someone is scraping one of the sites that I help administer supposedly using AHC/2.1 and they are not obeying the robots.txt file. There should be several seconds of delay between requests, but it appears to be going a 1 request/second. Is this normal behavior for AHC or is this a user misconfiguration in some way? If this is normal, could robots.txt file support for Crawl-delay values be added by default?