Opt Out - Githubissues

Yeah seconding this. Fences make good neighbors, and ward off lawsuits.

While I do think that in some respect this is a functionality covered by robots.txt, it would appear much of the AI industry seems to think robots.txt doesnt apply to them, so a more explicit llms.txt set of permissions clause,

Something like.

## Permisssions
Precedence:  trainonly, referenceonly, allow, disallow
disallow: / 
trainonly: /blog/archives
referenceonly: /current-data
allow: /blog

What this is saying is: the precedence of most valid to least valid is So what its saying is, "The default here is say away. However you may train your data on the archives but not reference it in answers, you may only reference current-data, but not train on it. For the /blog directory however, you may do both, but since trainonly has higher preferences you must exclude the archives from referencing.

This would provide a way for websites to choose what can be referenced, what can be trained on, and what must be excluded.

Possible complication: Maybe this would be extended to let people have separate permissions for different mediatypes. (Ie 'Yes you can train on the text, but please dont download all the videos for training')

possible arguement against: If this is intended for inference, maybe an extension for robots.txt to let website owners specify llm permissions is a better move.

AnswerDotAI / llms-txt

Opt Out #3