Closed mauriciopasquier closed 2 months ago
I read the Wikipedia post "robots.txt" and watched some videos on youtube, I think I understand the basics about the crawlers restriction but I haven't learned enough to figure out how I'd use it in this particular project. I don't think I'd restrict access to any specific file, but restricting AI crawlers from the whole website does sound rather tempting. Other than that I don't know what else I'd do.
I also explored the robots.txt of some websites like Wikipedia, Google, Youtube, and certain local pirate wiki -which doesn't have said file-. The Wikipedia file actually has comments for each set or restrictions explaining what are they for, or indications to not delete certain line, etc. -I guess the restriction management is also collaborative?- <3
On the other hand, I'm guessing there are ways to check for crawler behavior on your webpage or to detect unwanted interactions already happening?. I might research more about that later.
restricting AI crawlers from the whole website does sound rather tempting. Other than that I don't know what else I'd do.
That's more than enough for me! The idea was to do something very deployment-related while also learning a bit about web history ^^
The Wikipedia file actually has comments for each set or restrictions explaining what are they for, or indications to not delete certain line, etc. -I guess the restriction management is also collaborative?- <3
Awesome, I never thought of checking the wikipedia file! I assume the comments are for other sysadmins though, I don't think regular users have access to that
On the other hand, I'm guessing there are ways to check for crawler behavior on your webpage or to detect unwanted interactions already happening?. I might research more about that later.
Yes, but that would be for a sysadmin role and I think out of the scope of this project :P
And add a basic robots file.