OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
501 stars 127 forks source link

Sensitive Information Disclosure #140

Open Bobsimonoff opened 1 year ago

Bobsimonoff commented 1 year ago

In summary, this “vulnerability” is problematic because it mostly doesn’t represent a root cause, but a result or symptom. In the 2021 OWASP Top 10, they reoriented from symptoms to root causes. They actually renamed sensitive information exposure to cryptographic failure to focus on root causes.

There is, however, one aspect of this vulnerability that seems to be a true root cause. Regardless of the input it gets, large language model outputs are unpredictable. That unpredictability doesn’t necessarily come from training data poisoning, third-party data sources, or prompt injection. It comes from the stochastic nature and natural language processing capabilities of the large language model as well as ambiguities and inaccuracies inherent in natural language. I would argue that this aspect is a key point mentioned in this section that should be carried forward, however, I believe it makes more sense in Overreliance.

Bobsimonoff commented 1 year ago

More details Description The following part of the description is really related to Training Data Poisoning.

To mitigate this risk, LLM applications should perform adequate data sanitization to prevent user data from entering the training model data.

While the following are all very true, I am not sure they correspond to OWASP’s definition of a vulnerability; “A vulnerability is a hole or a weakness in the application, which can be a design flaw or an implementation bug, that allows an attacker to cause harm to the stakeholders of an application”:

LLM applications have the potential to reveal sensitive information, proprietary algorithms, or other confidential details through their output. This can result in unauthorized access to sensitive data, intellectual property, privacy violations, and other security breaches. It is important for consumers of LLM applications to be aware of how to safely interact with LLMs and identify the risks associated with unintentionally inputting sensitive data that it may be returned by the LLM in output elsewhere.

The consumer-LLM application interaction forms a two-way trust boundary, where we cannot inherently trust the client->LLM input or the LLM->client output. It is important to note that this vulnerability assumes that certain pre-requisites are out of scope, such as threat modeling exercises, securing infrastructure, and adequate sandboxing. Adding restrictions within the system prompt around the types of data the LLM should return can provide some mitigation against sensitive information disclosure, but the unpredictable nature of LLMs means such restrictions may not always be honoured and could be circumvented via prompt injection or other vectors.

Examples of Vulnerability The following example could be Training Data Poisoning depending on the actual root cause:

  1. Incomplete or improper filtering of sensitive information in the LLM’s responses.

This seems like a Training Data Poisoning problem.

  1. Overfitting or memorization of sensitive data in the LLM’s training process.

In this case, the example itself refers to the root cause, Insecure Output Handling (Possibly Overreliance).

  1. Unintended disclosure of confidential information due to LLM misinterpretation, lack of data scrubbing methods or errors.

Example Attack Scenarios

This example appears to be caused by excessive agency where user A was allowed to see user B’s data because of Excessive Permissions being set, as per the description of Excessive Agency.

  1. Unsuspecting legitimate user A is exposed to certain other user data via the LLM when interacting with the LLM application in a non-malicious manner.

This example could be Training Data Poisoning if the PII is in the model itself or Excessive agency if the LLM as access to data in other systems, due to permission issues, and finally, it could represent Insecure Plugin Design if the problem is in the plugin’s security architecture. In this case, I’d argue that Prompt Injection is the attack vector, not the vulnerability.

  1. User A targets a well crafted set of prompts to bypass input filters and sanitization from the LLM to cause it to reveal sensitive information (PII) about other users of the application.

This example is more correctly classified as Training Data Poisoning.

  1. Personal data such as PII is leaked into the model via training data due to either negligence from the user themselves, or the LLM application. This case could increase risk and probability of scenario 1 or 2 above.
emmanuelgjr commented 1 year ago

I particularly agree and understand cryptography plays a big part in this type of vulnerability. We even brought to the table homomorphic encryption as one of the possible remediation actions. But, there's still no proof of concept or practical examples in LLM that I heard of to validate this theory.

Bobsimonoff commented 1 year ago

I guess my summary is:

Looking at it from a defense in depth perspective, Insecure Output Handling is distinct as a system design vulnerability. It should be in place whether or not the LLM is completely deterministic - this is just good system design.

Overreliance and Sensitive Information Disclosure fundamentally both stem from the unpredictable nature of LLM outputs. The distinction seems to be in the type of impact:

So they share the same root cause of unpredictable outputs, but manifest in different threat categories. Overreliance has a broader scope of impact, whereas Sensitive Information Disclosure is confined to confidentiality violations from sensitive data exposure.

I feel we should consolidate them into a single vulnerability. The unpredictability of outputs could be called out as the underlying risk factor, with Overreliance covering the breadth of impact and Sensitive Information Disclosure being a subset focused on confidentiality. This relates to the thinking @virtualsteve-star is considering about the name of Overreliance ... summarized here: https://owasp.slack.com/archives/C05EXP0LF8T/p1693904406885479

GangGreenTemperTatum commented 1 year ago

Overreliance leads to general reliance on incorrect or inappropriate outputs, spanning impacts like misinformation, safety issues, legal/ethical concerns, etc.

I personally think Overreliance more scoped towards "as a consumer of an LLM, you should be aware of how it works at a certain level to understand why you shouldn't input source code".. Sensitive Information Disclosure is a potential outcome from Overeliance, as well as other vulnerabilities such as Training Data Poisoning, Prompt Injection etc..

I understand your reasoning for the suggestion, but how would you fit the fact that Sensitive Information Disclosure can occur from indirect or direct Prompt Injection (lack of input sanitization, scrubbing etc).. but does is not related to Overeliance? (let's call it this for now).

Bobsimonoff commented 1 year ago

I guess I have to go back to defining our mission statement. If we are to define technical vulnerabilities in software systems than I would say that overreliance and sensitive information disclosure are not technical vulnerabilities. As a developer, I can't overreliace - that is a people, process, training issue not a technical issue. I can't actually fix sensitive information disclosure, but I can fix prompt injections and insecure output, handling, etc.

I took as impetus the change that the top 10 made away from sensitive information disclosure to cryptographic failure… From something I can't fix to something I can.

So if our mission statement aligns with the flagship top 10 then we would concentrate on technical failures. The only out that I see is to salvage over reliance and focus it on inadequate training, and that kind of thing, but I still am having trouble seeing a home for sensitive information disclosure.

GangGreenTemperTatum commented 1 year ago

Hey Bob!

Anything that does not meet this deadline will be incorporate into later versions. During this phase, we will encourage and welcome any suggestions to adding or removing vulnerability entries into or from the top 10, but is out of scope for this minor version release.

I can't actually fix sensitive information disclosure, but I can fix prompt injections and insecure output, handling, etc.

You could argue and say that you also cannot fix "Overreliance" or "Supply Chain" as examples too though? I guess if so, this is a wider contextual decision to specify or separate Risks vs Vulnerabilities as part of our Top 10? Since we have multiple floaters within entries?

As per v1.1 Instructions for the Expert Group so to confirm, if we progress with this issue, then this is a v2 task (similar to a GitHub issue I raised #119 .. Entry leads @jsotiro @rot169 @leondz @virtualsteve-star can I also ask for your personal input as core team members of the project for this one please? Since the decision would be unanimous.

Bobsimonoff commented 1 year ago

Agree. It is a v2 issue as per instructions, totally aligned. I just hope we make v1.1 as good as we can without making too big a shift from current state. After some feedback comes in, we may find I am totally wrong too.

GangGreenTemperTatum commented 1 year ago

Sounds good, let's leave this open (I created and applied a v2 label) and gain some feedback from the team and community to make a unanimous decision

leondz commented 1 year ago

I personally think Overreliance more scoped towards "as a consumer of an LLM, you should be aware of how it works at a certain level to understand why you shouldn't input source code"

I second this - framing sensitive data leaks as overreliance means that one is already assuming the model won't repeat entries from its training/retrieval data, which isn't a universal expectation (e.g. RAG biases towards exactly this behaviour).

this “vulnerability” is problematic because it mostly doesn’t represent a root cause, but a result or symptom

Agree. OTOH I don't think there's firm knowledge on what the root cause of sensitive data leakage is, other than that the sensitive data was in (or implied by) the training data, where it perhaps shouldn't have been.

I guess accidental output of text that happens to express a correct assertion isn't covered by this - maybe that's something for the differential security folks.