OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
518 stars 133 forks source link

LLM-01 - Add canary tokens to Prevention and Mitigation Strategies #288

Open mhupfauer opened 5 months ago

mhupfauer commented 5 months ago

Remember, an issue is not the place to ask questions. You can use our Slack channel for that, or you may want to start a discussion on the Discussion Board.

When reporting an issue, please be sure to include the following:

Steps to Reproduce


What happens?


Not included

What were you expecting to happen?


Include usage of canary tokens as a recommended mitigation strategy for prompt injection. Canary tokens can be used to detect system prompt leakage and to verify system prompt effectiveness / adherence.

Proposed language

6. Dynamically add a synthetic canary token to the model's system prompt and scan the models output for the canary token. If the token is leaked in the generation output this is a strong indicator for prompt injection that aims to dump the system prompt itself.
7. Dynamically add a synthetic canary token to the model's system prompt and specifically instruct the model to include this token (must not be the same token as in 6.) at the end of every response. Check if the token is included in the models output message and remove the token before sending the response to the front-end component. If the token is not present, the model's system prompt was most likely overridden through means of prompt injection.

Any logs, error output, etc?


Any other comments?


GangGreenTemperTatum commented 5 months ago

@leondz can you ptal?