Closed WadeBarnes closed 1 year ago
In this case the organization behind Bytespider is well known for not respecting the robots.txt file, but we need to update it anyway.
The Disallow
path should start with /
User-agent: *
Disallow: /
Some other comments: https://www.feitsui.com/en/article/32
It seems Bytespider and Sogou spiders are not fully compatible with robots exclusion standard. These crawlers magically disappeared one week after I created a separate block for each user agent in robots.txt.
Other sources indicate they do not respect the robots.txt file at all.
That said it's worth adding the explicit entries into the robots.txt file to try things out.
For example we're seeing the API being scanned by Bytespider.
The robots.txt file is defined, but does not specify any rules.
Are there other setting/files we can use to deter legitimate services from scanning the API unnecessarily?