Closed rugk closed 4 years ago
This is already implemented and some servers already opted out. If the server responds to the check with anything else then 200 or 3xx, we won't add it. Some servers work in the browser, but respond with 403 unauthorized as soon as they see the string "bot" in the user agent. And I purposefully chose to use:
I have already planned a task to create an info page that documents this and is linked in the above user agent header. This is common practice for bots, as admins may stumble on the odd access pattern of the bot in their logs and then can easily follow the provided link, as the user agent string is usually part of the logs.
Edit: To clarify - such a page should include example configurations on how to opt out of the directory for apache and nginx - something like:
# nginx example
if ($http_user_agent ~ PrivateBinDirectoryBot ) {
return 403;
}
# apache example
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} PrivateBinDirectoryBot [NC]
RewriteRule . - [R=403,L]
Okay, good idea, then best practice would be to also adhere and check for the robots.txt
… Okay that prevents anything by default, but maybe just check whether there is an explicit "disallow" for PrivateBinDirectoryBot?
Good idea: That is easy to add/edit even on older versions and requires no new API. The above webserver mechanism will remain an option and alternatively they can add a section in robots.txt:
User-agent: PrivateBinDirectoryBot
Disallow: /
I know currently also everyone can add any site to the wiki, but for this new project, should we require some authentication/permission from the site owner to have it appear in the directory?
I can think that some people may not want their instance to be listed publicly (to avoid the load/use etc., because it is a private instance etc.). As this site now makes it very easy to add instances, maybe we do not want that?
Implementation
false
there by default. Or maybe not, in case we fear admins will not explicitly opt-in and just use the default settings. (although they have nothing against providing a public service) Of course, that feature could then only be supported for new instances.