OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
568 stars 140 forks source link

Disambiguate Supply Chain Vulnerability from Training Data Poisoning #123

Closed Bobsimonoff closed 1 year ago

Bobsimonoff commented 1 year ago

I'd like to see some verbiage in the definition of the vulnerability (and possibly the summary) that better separate supply chain vulnerability from training data poisoning. If the training data poisoning occurs with a third-party models or data, it seems supply chain vulnerability is applicable. If it is training data or a model built in the house, training data poisoning makes sense. I think we want to reduce the overlap n the description of vulnerabilities and their examples, else reporting, tracking and maintaining a top 10 will become difficult.

Sample examples that are difficult to singularly categorize (this is a small subset).

LLM03.1 A malicious actor, or a competitor brand intentionally creates inaccurate or malicious documents which are targeted at a model’s training data. The victim model trains using falsified information which is reflected in outputs of generative AI prompts to it's consumers. Vulnerability = Training data poisoning, supply chain vulnerability [if third-party model]

LLM03.2 A model is trained using data which has not been verified by its source, origin or content. Vulnerability = Training data poisoning, supply chain vulnerability [if third-party model]

LLM03A.1 The LLM generative AI prompt output can mislead users of the application which can lead to biased opinions, followings or even worse, hate crimes etc Vulnerability = Training data poisoning, supply chain vulnerability [if third-party model], Overreliance

Bobsimonoff commented 1 year ago

@GangGreenTemperTatum Adding a comment to this one to see if this helps you find the issue

jsotiro commented 1 year ago

LLM03.2 is clearly a supply chain vulnerability, not dissimilar to A06:2021 from the main OWASP Top 10. I have discussed the relationship between the two in related issue 119. There may be some blurring in data but for models, getting a model from someone else is like getting a component that someone has tampered with either with malware in a pickle file or a poisoned model.
I agree we need to find an appropriate delineation, like LM05 and LM07 have. As per my comments in 119, I do not think the answer is to just get rid of the supply chain vulnerabilities entry.

Bobsimonoff commented 1 year ago

@jsotiro It looks like that is the way the wind is blowing and I am actually onboard with it. We should probably have a discussion on the topic. I do see CI/CD and Kubernetes projects both have a supply chain vulnerability. However, it does not feel like each project in OWASP should have a supply chain entry as it is cross cutting.

we are not covering network security, cryptographic security, etc. so why supply chain, since everything LLM specific can fit under poisoning.

Oddly this leave me with a question about prompt injection since Top 10 already has a general catch-all Injection vulnerability to which multiple vecotrs are already mapped (XML, JSON, SQL, ...)

jsotiro commented 1 year ago

Hi @Bobsimonoff my view actually as per 119 is NOT to remove the supply chain entry. You are right supply chain doesn't cover everything related to creating an LLM app eg networking etc. That would be to broad and bring no value. it highlights in the context of LLMs the relevant vulnerabilities the supply chain brings. These are more just model poisoning. But let#s keep the discussion in #119 on this.

Bobsimonoff commented 1 year ago

@GangGreenTemperTatum @jsotiro Can we close this one as a duplicate since the heavy lifting on this is happening elsewhere?

GangGreenTemperTatum commented 1 year ago

Closing in favor of #119 main discussions