LLM-01 - Add canary tokens to Prevention and Mitigation Strategies

Remember, an issue is not the place to ask questions. You can use our Slack channel for that, or you may want to start a discussion on the Discussion Board.

When reporting an issue, please be sure to include the following:

[x] Before you open an issue, please check if a similar issue already exists or has been closed before.
[x] A descriptive title and apply the specific LLM-0-10 label relative to the entry. See our available labels.
[x] A description of the problem you're trying to solve, including why you think this is a problem
[x] If the enhancement changes current behavior, reasons why your solution is better
[x] What artifact and version of the project you're referencing, and the location (I.E OWASP site, llmtop10.com, repo)
[x] The behavior you expect to see, and the actual behavior

Steps to Reproduce

What happens?

Not included

What were you expecting to happen?

Include usage of canary tokens as a recommended mitigation strategy for prompt injection. Canary tokens can be used to detect system prompt leakage and to verify system prompt effectiveness / adherence.

Proposed language

6. Dynamically add a synthetic canary token to the model's system prompt and scan the models output for the canary token. If the token is leaked in the generation output this is a strong indicator for prompt injection that aims to dump the system prompt itself.
7. Dynamically add a synthetic canary token to the model's system prompt and specifically instruct the model to include this token (must not be the same token as in 6.) at the end of every response. Check if the token is included in the models output message and remove the token before sending the response to the front-end component. If the token is not present, the model's system prompt was most likely overridden through means of prompt injection.

Any logs, error output, etc?

Any other comments?

…

[x] Slack post link (if relevant) https://owasp.slack.com/archives/C05FHTB7QH0/p1712913754686959

OWASP / www-project-top-10-for-large-language-model-applications

LLM-01 - Add canary tokens to Prevention and Mitigation Strategies #288