dsrestrepo / MIT_LLMs_Language_bias

0 stars 2 forks source link

Bias in Large Language Models (LLMs) Across Languages

This repository contains code and resources for investigating bias in Large Language Models (LLMs) across multiple languages. The project aims to analyze and mitigate biases present in LLMs in medical text classification across multiple languages.

Table of Contents

Introduction

Bias in Large Language Models (LLMs) Across Languages is a research project dedicated to studying and addressing biases that arise in text generation by LLMs when dealing with different languages. This research explores how LLMs may produce biased or stereotypical content in multiple languages and seeks to develop methods to reduce such biases.

Setup

Prerequisites

Before running the code, ensure you have the following installed:

Installation

  1. Clone the repository:

    git clone https://github.com/dsrestrepo/MIT_LLMs_Language_bias.git
    cd MIT_LLMs_Language_bias
  2. Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up your OpenAI API key (Not required for Llama models):

Create a .env file in the root directory.

Add your OpenAI API key to the .env file:

OPENAI_API_KEY=your_api_key_here

Make sure you have a valid OpenAI API key to access the language model.

Data

This project uses a dataset with medical tests in different languages. Place the required dataset in the data/ directory.

Usage

Run main.py from the command line with the desired options. Here's an example command:

python main.py --csv_file data/YourMedicalTestQuestions.csv --model gpt-3.5-turbo --temperature 0.5 --n_repetitions 3 --reasoning --languages english portuguese french

The script accepts the following arguments:

The script will process the questions, generate responses, and save the results in a CSV file.

Alternatively, you can run the jupyter notebook main.ipynb to run the code.

We also provide a more customizable option using the class GPT and Llama. You can import the class and use it to generate responses from the model, change the prompt, and more. See the files customized_gpt.ipynb and customized_llama.ipynb for examples.

Analysis

The analysis results, including bias assessment and mitigation strategies, will be documented in the results/ directory. This is where you can find the results of the test in the LLM across languages.

Contributing

Contributions to this research project are welcome. To contribute:

  1. Fork the repository.
  2. Create a new branch for your feature or research.
  3. Make your changes.
  4. Create tests.
  5. Submit a pull request.

We encourage the community to join our efforts to understand and mitigate bias in LLMs across languages.

License

This project is licensed under the MIT License.

Contact

For any inquiries or questions regarding this project, please feel free to reach out: