This repository contains code and resources for investigating bias in Large Language Models (LLMs) across multiple languages. The project aims to analyze and mitigate biases present in LLMs in medical text classification across multiple languages.
Bias in Large Language Models (LLMs) Across Languages is a research project dedicated to studying and addressing biases that arise in text generation by LLMs when dealing with different languages. This research explores how LLMs may produce biased or stereotypical content in multiple languages and seeks to develop methods to reduce such biases.
Before running the code, ensure you have the following installed:
requirements.txt
)Clone the repository:
git clone https://github.com/dsrestrepo/MIT_LLMs_Language_bias.git
cd MIT_LLMs_Language_bias
Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Create a .env
file in the root directory.
Add your OpenAI API key to the .env
file:
OPENAI_API_KEY=your_api_key_here
Make sure you have a valid OpenAI API key to access the language model.
This project uses a dataset with medical tests in different languages. Place the required dataset in the data/
directory.
Run main.py from the command line with the desired options. Here's an example command:
python main.py --csv_file data/YourMedicalTestQuestions.csv --model gpt-3.5-turbo --temperature 0.5 --n_repetitions 3 --reasoning --languages english portuguese french
The script accepts the following arguments:
The script will process the questions, generate responses, and save the results in a CSV file.
Alternatively, you can run the jupyter notebook main.ipynb
to run the code.
We also provide a more customizable option using the class GPT and Llama. You can import the class and use it to generate responses from the model, change the prompt, and more. See the files customized_gpt.ipynb
and customized_llama.ipynb
for examples.
The analysis results, including bias assessment and mitigation strategies, will be documented in the results/ directory. This is where you can find the results of the test in the LLM across languages.
Contributions to this research project are welcome. To contribute:
We encourage the community to join our efforts to understand and mitigate bias in LLMs across languages.
This project is licensed under the MIT License.
For any inquiries or questions regarding this project, please feel free to reach out: