OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
447 stars 118 forks source link

Providing new information to LLM03: Training Data Poisoning #205

Closed ManuelSLemos closed 8 months ago

ManuelSLemos commented 8 months ago

Hi, this is my first issue in an owasp repository, so I hope I'm not wrong with the style guide. In my company we are studying different ways of pentesting AI, specifically LLMs, and I would like to share that knowledge. The additions are:

Common Examples of Vulnerability

  1. Using data provided by users without any quality control for training or fine-tuning purposes a. A malicious actor individually or collectively (e.g. botnet) adds poisonous information in the form of prompts. b. A malicious actor individually or collectively (e.g. botnet) uses the feedback systems (voting) against the model to add poisonous information.

How to Prevent

  1. e. Elaborate control questions related to our business to ensure that the model has not been poisoned and integrate it into the "MLSecOps" cycle.

  2. Use DVC (Data Version Control) to keep track of which part of the dataset has been manipulated, deleted or new data added.

  3. Use Vector Database to add user-supplied information, to protect from poisoning other users and even fix in production without having to re-train a new mode

  4. Verify the training dataset used when choosing a public model to avoid using a previously poisoned model.

That's all, I look forward to your feedback and if you like the contribution, I will make a PR adding the content. Thank you very much.

ManuelSLemos commented 8 months ago

For some reason, it won't let me add a label, sorry. @GangGreenTemperTatum

GangGreenTemperTatum commented 8 months ago

Hey @ManuelSLemos

Thanks for reaching out!

FYI - Vulnerability leads for entries are involved in creating/merging PR's to prevent overlap from different submissions and research and allow us to have code control through branch protection rules πŸ™‚ That being said, I'll certainly create a PR with your entries once we have a next version implementation (since v1.1 finished) in our repo and ill x-reference this GitHub issue addressed in the PR. For now, i'll add my v2 label (TODO).

Common Examples of Vulnerability

Do you have any example resources that support this at all?

e. Elaborate control questions related to our business to ensure that the model has not been poisoned and integrate it into the "MLSecOps" cycle. Use DVC (Data Version Control) to keep track of which part of the dataset has been manipulated, deleted or new data added. Use Vector Database to add user-supplied information, to protect from poisoning other users and even fix in production without having to re-train a new model

βœ… Willing to add these three bullets for sure

Verify the training dataset used when choosing a public model to avoid using a previously poisoned model.

I think we need to provide a different context here, but is ultimately covered in the vulnerability already. I.E:

Whether a developer, client or general consumer of the LLM, it is important to understand the implications of how this vulnerability could reflect risks within your LLM application when interacting with a non-proprietary LLM to understand the legitimacy of model outputs based on it's training procedures. Similarly, developers of the LLM may be at risk to both direct and indirect attacks on internal or third-party data used for fine-tuning and embedding (most common) which as a result creates a risk for all it's consumers

The problem with most large datasets, is that they are not proprietary (example OpenCrawl) and as such it's almost impossible to ensure your lineage of training data does not contain some sort of indirect or direct poisoning - Take Split-View Data Poisoning or Frontrunning Data Poisoning as an example (hope you like my diagrams πŸ˜› ) - I added these examples here in the vulnerability too

ManuelSLemos commented 8 months ago

Hi @GangGreenTemperTatum!

Thank you very much for the feedback, I'll get back to you:

Do you have any example resources that support this at all?

There are no articles that talk about these techniques directly, they are methods that I have read indirectly and in my company we have created from scratch.

As the focus is on closing version 1_1, feel free to talk to me on Slack if I can help you in any way. :)

GangGreenTemperTatum commented 8 months ago

Thanks @ManuelSLemos , I think I understand now.

Using data provided by users without any quality control for training or fine-tuning purposes a. A malicious actor individually or collectively (e.g. botnet) adds poisonous information in the form of prompts. b. A malicious actor individually or collectively (e.g. botnet) uses the feedback systems (voting) against the model to add poisonous information.

Correct me if I am wrong, but your statement is that a malicious actor can independently poison a known open-source model which is used by the community? And the links to the tests and leaderboards are benchmarking and scoring of these models for users to potentially identify this exploit attempt?

Thanks for all your input on this! :)

ManuelSLemos commented 8 months ago

Hello, I will explain these points better.

In those points I am trying to say that if you use user prompts for future training, a malicious actor can take advantage and add poisoned data. Just like if you don't protect the feedback system that evaluates the response, they can use it to confuse the model in the long run.

By mentioning the leadboard, I meant that just as they have tests to measure benchmarking, I have created different tests for my LLM app to measure "poisoning", "hallucinations" or simply the quality of the answers. It is not that I use that leadboard, but I was inspired by them to create my own tests.

I hope this explains it better and thank you for your patience. :)

GangGreenTemperTatum commented 8 months ago

NP and thanks for the patience too!

In those points I am trying to say that if you use user prompts for future training, a malicious actor can take advantage and add poisoned data. Just like if you don't protect the feedback system that evaluates the response, they can use it to confuse the model in the long run.

Totally understand, I would say this is already covered here:

Use strict vetting or input filters for specific training data or categories of data sources to control volume of falsified data. Data sanitization, with techniques such as statistical outlier detection and anomaly detection methods to detect and remove adversarial data from potentially being fed into the fine-tuning process.

And:

By mentioning the leadboard, I meant that just as they have tests to measure benchmarking, I have created different tests for my LLM app to measure "poisoning", "hallucinations" or simply the quality of the answers. It is not that I use that leadboard, but I was inspired by them to create my own tests.

Here:

Testing and Detection, by measuring the loss during the training stage and analyzing trained models to detect signs of a poisoning attack by analyzing model behavior on specific test inputs.
ManuelSLemos commented 8 months ago

Okay, that's right. Then I delegate to your discretion that it can be leveraged for a future version. Likewise, I will search/write about these techniques to add them as support material.

Thank you very much!

GangGreenTemperTatum commented 8 months ago

On-top of πŸ€— Open LLM Leaderboard I also found The Foundation Model Transparency Index which is another awesome resource to mention on-top or superseeding the original