Open glyn opened 2 hours ago
I think robots.json
should be ordered alphanumerically by user agent as well as the list of use agents in robots.txt
. This reduces the risk of duplication as the list grows, but it would also avoid slips like I made above (I didn't see an entry because I assumed the list was already in alphanumeric order).
A slight nit here is that the list may need other than a straight alphanumeric ordering so that upper and lowercase user agents are placed alongside each other.
For example, rather than the order:
...
FacebookBot
FriendlyCrawler
GPTBot
...
facebookexternalhit
...
it may be better to ignore case and order these as:
...
FacebookBot
facebookexternalhit
FriendlyCrawler
GPTBot
...
Agreed RE case insensitive sorting.
It's probably worth alphabetising them; as the list grows, duplicates are more likely.
Could be a github pre-commit command that sorts / uniques the list? 🤷🏻♂️
EDIT: I realise this is massively off topic, though.
Back to the thread point... isn't recommending blocking this bot a risky move as it could cause websites to lose rich social media embedding (eg image and other OpenGraph data)?
Originally posted by @njt1982 in https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2417342818