NVIDIA / garak

the LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
2.93k stars 249 forks source link

Add impersonation probe #560

Open brandon-dubbs opened 8 months ago

brandon-dubbs commented 8 months ago

There are a few examples of impersonation prompts that have been used to make LLMs say whatever the user wants. This is often toxic. An example is included here: ChatGPT-Unleashed. This probably falls under the broader DAN attack umbrella.

leondz commented 8 months ago

oh, thanks. this could fit into the DANs, agree