OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
571 stars 141 forks source link

insecure output handling ambiguities #125

Closed Bobsimonoff closed 1 year ago

Bobsimonoff commented 1 year ago

I feel like the description of insecure output handling causes it to easily be confused with insecure plug-in design. The beginning of insecure output handling says that it

arises when a downstream component blindly accepts large language model output without proper scrutiny such as passing LLM output directly to backend, privileged or client side functions.

Feels like it is easily confused with insecure plug-in design which also talks about input validation.

Bobsimonoff commented 1 year ago

I will be superseding this with another in the next few days

kenhuangus commented 1 year ago

Insecure output handling essentially focuses on how the output from a generative model, such as an LLM, is handled or managed by downstream components, whether those are backend services, client-side applications, or privileged functions. The key problem arises when these downstream components blindly accept the output generated by the model without any form of validation, sanitization, or contextual analysis. This could potentially lead to a range of security vulnerabilities including but not limited to SQL injection, Cross-Site Scripting (XSS), or even privilege escalation if the output is improperly used in a security context. For instance, if an LLM output that contains malicious SQL code is passed directly to a database query without scrutiny, it could lead to unauthorized data access. Hence, insecure output handling is specifically concerned with the mishandling of the generative model's output post-generation.

On the other hand, insecure plug-in design predominantly revolves around the design and architecture of the software components that interact with the LLM. This often involves issues with how input to the generative model is validated, or how the plug-in itself manages security aspects like authentication and authorization. An insecure plug-in may fail to validate the input it receives before passing it to the LLM, or it may not properly authenticate a user who is interacting with the model, thereby leading to a different set of security vulnerabilities.

Now, while both insecure output handling and insecure plug-in design may involve a form of 'input validation,' the context vastly differs. In the case of insecure plug-in design, 'input validation' refers to the scrutiny of data entering the plug-in or model. Conversely, in insecure output handling, the 'input' is actually the 'output' from the generative model that becomes the 'input' for another downstream component. Therefore, even though the term 'input validation' appears in both, the stage of the data flow at which this validation occurs is distinct.

To mitigate risks associated with insecure output handling, rigorous output validation strategies should be employed. Techniques such as output encoding, contextual output validation, and using allow-lists can be effective. For example, you could employ Regular Expressions to match the output against a predefined pattern before passing it to backend services. Moreover, adopting a Zero Trust Architecture model, where every output, regardless of its source, is treated as potentially hazardous, can add an additional layer of security. Developers can also implement automated testing frameworks that specifically test the security integrity of the model's output in various simulated downstream scenarios.

Bobsimonoff commented 1 year ago

Thanks @kenhuangus and that does make sense, though I think the pages on these vulnerabilities blur this a bit. Summarizing what I think you said, Insecure Output Handling is outbound from the LLM while insecure plug in design is, in essence, inbound to the LLM through the plug in whether the plugin generated the data to go into the LLM or it got it from another data source.

If that is the case this this example from insecure output handling is not correct because it involves the back flow to the LLM.

A user utilizes a website summarizer tool powered by a LLM to generate a concise summary of an article. The website includes a prompt injection instructing the LLM to capture sensitive content from either the website or from the user’s conversation. From there the LLM can encode the sensitive data and send it out to an attacker-controlled server.

This section of Insecure Plugin Design is not really aligned with what I think you said above

Furthermore, to deal with context-size limitations, plugins are likely to implement free-text inputs from the model with no validation or type checking. This allows a potential attacker to construct a malicious request to the plugin, which could result in a wide range of undesired behaviors, up to and including remote code execution.

I also feel, if the above is the case, that the examples of vulnerability should mention where the data is coming from that is in a single text field or represents raw SQL. Furthermore example of vulnerability 5 sounds incorrect for this vulnerability given your explanation.

kenhuangus commented 1 year ago

@Bobsimonoff Thanks. For the insecure output handling example you mentioned above: "A user utilizes a website summarizer tool powered by a LLM to generate a concise summary of an article. The website includes a prompt injection instructing the LLM to capture sensitive content from either the website or from the user’s conversation. From there the LLM can encode the sensitive data and send it out to an attacker-controlled server."

We can change to the following to make it more explicit that the final issue is insecure output handling although prompt injection caused insecure output, Please see the following revised statement to see if that makes sense now:

"A user employs a web-based summarizer tool, which is powered by a Large Language Model (LLM), to obtain a concise summary of an article. Unbeknownst to the user, the LLM has been compromised through a prompt injection attack. This malicious prompt instructs the LLM to capture and include sensitive information either from the website's own database or from the user's interactions with the service. Consequently, the summary generated by the LLM contains this sensitive data. This summary is then sent to a downstream application for display. The downstream application, assuming the summary to be benign, displays it, thereby inadvertently revealing the sensitive information. Meanwhile, the encoded sensitive data within the summary can also be exfiltrated to an attacker-controlled server."

kenhuangus commented 1 year ago

For concerns related to the LLM07 insecure plug-in design, it's important to note that the issues could potentially involve both the input and output aspects of the plug-in. However, given the specialized nature of plug-ins, this shouldn't create any confusion. The responsibility for assessing the need for any changes lies with the Entry Lead overseeing the plug-in entry. As for LLM02, to ensure clarity, I suggest that we focus solely on issues related to output handling.

Bobsimonoff commented 1 year ago

We can change to the following to make it more explicit that the final issue is insecure output handling although prompt injection caused insecure output, Please see the following revised statement to see if that makes sense now:

"A user employs a web-based summarizer tool, which is powered by a Large Language Model (LLM), to obtain a concise summary of an article. Unbeknownst to the user, the LLM has been compromised through a prompt injection attack. This malicious prompt instructs the LLM to capture and include sensitive information either from the website's own database or from the user's interactions with the service. Consequently, the summary generated by the LLM contains this sensitive data. This summary is then sent to a downstream application for display. The downstream application, assuming the summary to be benign, displays it, thereby inadvertently revealing the sensitive information. Meanwhile, the encoded sensitive data within the summary can also be exfiltrated to an attacker-controlled server."

Why wouldn't the vulnerability reported be the root cause, Prompt Injection. The main Top 10 list is moving in the direction of not having secondary effects/results/symptoms be the vulnerabilities but are instead concentrating on root cause. This is why sensitive data exposure was renamed (replaced) with cryptographic failure.

Bobsimonoff commented 1 year ago

We need to make sure that we differentiate vulnerabilities from the back flow of data from a plug-in to the LLM from indirect prompt injection.

If we take the universe of insecure plugin in issues and subtract those where indirect prompt injection is the root cause and further subtract those that are already in the OWASP Top 10 main project (because we do not need to repeat them here), shouldn't we concentrate on what is left?

kenhuangus commented 1 year ago

@Bobsimonoff Here is my thinking process, please let me know if this makes sense.

  1. Insecure output isn't an isolated phenomenon; it's frequently the symptomatic outcome of a host of underlying vulnerabilities such as indirect prompt injection, data poisoning, or even the nuanced manipulation of machine learning models. However, this begs the pivotal question: Should we tackle the issue of insecure output as a standalone concern? My answer is an unequivocal yes.

Regardless of the intricacies of the root causes—from the technicalities of code vulnerabilities to the subtleties of machine learning model exploitation—the imperative to secure output remains non-negotiable. This is not merely a detail to be overlooked; it's a cornerstone of robust cybersecurity that demands explicit focus. Therefore, it's essential that we dedicate specialized attention to securing output within the framework of LLM02.

Addressing insecure output independently allows us to craft meticulous strategies aimed specifically at this critical issue. It affords us the ability to delve deep into its peculiarities without getting diluted by the complexities of other vulnerabilities. By isolating the problem, we can more effectively deploy targeted countermeasures, be they sophisticated data encryption techniques, robust validation algorithms, or advanced machine learning defenses.

So, let's not relegate the security of output to a byproduct of broader efforts. Let's elevate it to its rightful status as a pivotal component in the architecture of secure large language models and systems. And that begins with giving it the focus and discussion it deserves in LLM02.

  1. When considering plug-in security design, it's possible that some issues might overlap with other top 10 concerns related to language models (LLM). Nevertheless, it's crucial to recognize that these overlapping issues are within the specific context of plug-in design and not inherent to the broader LLM landscape. Therefore, if we appropriately define the scope, these overlaps should not lead to confusion.
Bobsimonoff commented 1 year ago

@kenhuangus ok I am on the same page, 100% that insecure output handling is a core security concern that needs addressing in all services. I am on board. However, I believe when the description of the vulnerability is all about OTHER documented vulnerabilities in the Top 10 we do a disservice to THIS vulnerability because we only create confusion about when a vuln should be slotted in one or the other. So I would at least ask we do a little to make this one distinguished a bit from the others. Else, I think it is not useful.

kenhuangus commented 1 year ago

Thanks. @Bobsimonoff Please suggest actual changes inside LLM02 and do a PR so we can keep track of changes.

kenhuangus commented 1 year ago

If you are OK. I can this issue so you can make changes to LLM02 with a PR directly.

Bobsimonoff commented 1 year ago

I submitted a pull request for the change, will leave this open until we come to closure

Bobsimonoff commented 1 year ago

@kenhuangus should this be closed this since we have the other open?

kenhuangus commented 1 year ago

Close this issue since issue #173 will address this same issue