OWASP / www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Respository
Other
572 stars 141 forks source link

Diversity in building the model and training it LLM03 #114

Closed ManishYadu7 closed 1 year ago

ManishYadu7 commented 1 year ago

-Have a team with diverse backgrounds and solicit broad input. Diverse perspectives are needed to characterize and address how language models will operate in the diversity of the real world, where if unchecked they may reinforce biases or fail to work for some groups. -Too much reliance could cause bias based on color, gender, and physical appearance

  1. https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai
GangGreenTemperTatum commented 1 year ago

Hey Manish

Many thanks for reaching out and I appreciate the suggestion.

Whilst I understand with your hypothesis, I do not feel this is significant enough risk to explicitly call out as a vulnerability against Training Data Poisoning. As LLM application developers, we do care about safety and harms-related risk such as bias, judgement etc. Ultimately, we should be catering for this in other avenue's such as sources & supply chain of the foundation training data, fine-tuning and benchmarking. In terms of these risks, the current LLM03: Training Data Poisoning entry lists ways to mitigate against high risk data sources already:

image

I will close this one out in ~ a week if I don't get a response on this.