Virtual-Coffee / virtualcoffee.io

Public site for Virtual Coffee
https://virtualcoffee.io
Other
216 stars 232 forks source link

Do we need a robots.txt file to stop AI crawlers? #1032

Open ClJarvis opened 11 months ago

ClJarvis commented 11 months ago

Is there an existing issue for this?

Type of Change

Brand new page

URL of existing page

No response

Context for content change

Do we need a robots.txt to stop ai crawlers from training on VC content? We have a lot of content here and are adding stuff constantly.
We also have a list of members names and socials plus now approximate locations. Do we need to disallow OpenAI and Bard from training on what our members are writing?

Proposed solution

I could write a roboot file that tells the crawlers to not read our files.

Resources that can help

No response

Collaborators

No response

Code of Conduct

danieltott commented 11 months ago

@ClJarvis in general, we do want robots to crawl the site, so that we show up on google etc. However I hadn't really thought about AI. If you can find some documentation on how to do that (tell openAI and Bard not to crawl, but allowing other bots) I'd definitely consider this.

ClJarvis commented 11 months ago

It's my understanding that we can allow google while blocking Bard/AI bots. I will find the docs I used a while ago.

paceaux commented 4 months ago

I have updated robots.txt on my own sites to forbid the AI crawlers. I would recommend it because content on VC properties is copyrighted. the VC Code is copyrighted under creative commons so I personally would not recommend contributing to any AI unless we intentionally want to.

This is what I added to my sites:

User-agent: GPTBot
Disallow: /
danieltott commented 4 months ago

@paceaux that sounds reasonable. are there any others aside from GPTBot?

Do you think you could make a PR for us?

paceaux commented 4 months ago

@danieltott yes there's a few others. and sure, I'll do a PR.