OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
584 stars 141 forks source link

Extend LLM-04: RAG poisoning with glitch tokens causes DoS #283

Open mhupfauer opened 7 months ago

mhupfauer commented 7 months ago

Remember, an issue is not the place to ask questions. You can use our Slack channel for that, or you may want to start a discussion on the Discussion Board.

When reporting an issue, please be sure to include the following:

Steps to Reproduce


  1. NA

What happens?


2_0_vulns/LLM04_ModelDoS.md does not refer to DoS through glitch tokens injected into the prompt from a RAG solution. If a malicious actor is able to introduce broadly relevant information to the RAG database which includes glitch tokens for the given model that is used they are able to effectively run a denial of service attack for all users of the LLM.

The main issue is that there is no clear indication which token caused the model to glitch so there is no obvious way to automatically remediate such issues. Enterprises build large RAG databases from (mostly) user generated content (i.e. Confluence / SharePoint / ... ) and also update this content frequently. A malicious actor could therefore easily introduce new content to the RAG database, including but not limited to glitch tokens which effectively cause a Denial of Service situation for all end users of the application.

What were you expecting to happen?


Any logs, error output, etc?


Any other comments?


A "normal" glitch token attack doesn't pose a significant threat as it only renders the current user session / context unusable. However through a poisoned RAG database a malicious user can inject these tokens into some/many/most/ if not all conversations, thus causing a broad service outage.

Sources talking about the issue


  1. https://www.youtube.com/watch?v=WO2X3oZEJOA
  2. https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
  3. https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

What versions of hardware and software are you using?


Operating System:Browser:

GangGreenTemperTatum commented 7 months ago

@kenhuangus , you able to take this? tyia!

kenhuangus commented 7 months ago

Yes, I will investigate and if needed I will incorprate this vector of attack using glitch token for RAG based LLM Apps.

kenhuangus commented 7 months ago

@mhupfauer Thanks Markus Hupfauer for the contribution, I will incorpate the following text from Mark.

Description [...] An additional Denial of Service method involves glitch tokens — unique, problematic strings of characters that disrupt model processing, resulting in partial or complete failure to produce coherent responses. This vulnerability is magnified as RAGs increasingly source data from dynamic internal resources like collaboration tools and document management systems. Attackers can exploit this by inserting glitch tokens into these sources, thus trigger a Denial of Service by compromising the model's functionality. Common Examples of Vulnerability [...]

  1. Glitch token RAG poisoning: The attacker introduces glitch tokens to the data sources of the RAGs vector database, thereby introducing these malicious tokens into the model's context window through the RAG process, causing the model to produce (partially) incoherent results. Prevention and Mitigation Strategies [...]
  2. Build lists of known glitch tokens and scan RAG output before adding it to the model’s context window. Example Attack Scenarios [...]
  3. An attacker adds glitch tokens to existing documents or creates new documents with such tokens in a collaboration or document management tool. If the RAGs vector database is automatically updated, these malicious tokens are added to its information store. Upon retrieval through the LLM these tokens glitch the inference process, potentially causing the LLM to generate incoherent output.
mhupfauer commented 7 months ago

@kenhuangus Thanks for merging my proposal!

kenhuangus commented 7 months ago

Thank you as well.

mhupfauer commented 7 months ago

@kenhuangus: There was a slight copy-paste error I think. Example Attack Scenario is now twice in the document. Not the entire chapter but the headline :)