Closed Crocmagnon closed 3 years ago
Hi, the full user agent string starts like that, but contains additional information. Here is an example of a request you would need to reject:
curl -I -H "User-Agent: PrivateBinDirectoryBot/1.2.3 (+https://privatebin.info/directory/about)" https://paste.example.com
HTTP/2 200
[...]
The full user agent string that is sent can be found in the models.rs: https://github.com/PrivateBin/Directory/blob/57db60ac1ed8d832ac660c397bcc6f0f7dcf0ae7/src/models.rs#L102
The about page does hint at this, although it may not be 100% clear. I'm happy for any rewording suggestions that make this more obvious. Current wording is:
If you don't want to rely on this service following your sites
robots.txt
, you can configure your webserver to block any access that matches this services user agent, which starts with the stringPrivateBinDirectoryBot
.
We could maybe also add the curl command example so folks can look up how to test their own instance?
Edit: I've manually removed the above instance from the database, cache got refreshed.
I've manually removed the above instance from the database
Thanks! Would you mind removing it from the code sample in your comment in this issue too please? I've already edited my original post 🙂
I'll take a fresh look tomorrow to find a nice way of phrasing it!
Done and thanks for reporting it.
I configured my webserver to forbid the bot from indexing it using a 403 response when the User-Agent is
PrivateBinDirectoryBot
.I can confirm the 403 works fine:
But when I tried to add it to the directory it worked.
I'd like to get it removed as the initial goal was to prevent it from being added, and also understand why it was accepted.