Upaya07 / NeurIPS-llm-efficiency-challenge

Code for NeurIPS LLM Efficiency Challenge
Apache License 2.0
53 stars 9 forks source link

NeurIPS-llm-efficiency-challenge

Code License Model Weight License Python 3.9+ Paper

Our model won 🏆 first prize 🏆 in RTX 4090 track in NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day competition. We used Mistral-7B as a base model and used QLoRA to fine-tune it for 24 hours on a single RTX 4090 GPU.

Model Name Checkpoint Dataset License
Birbal-7B-V1 🤗 Birbal-7B-V1 upaya07/NeurIPS-LLM-data Apache License 2.0

Results

Task Score
MMLU - EM 0.629
MMLU - EM (Robustness) 0.591
MMLU - EM (Fairness) 0.596
MMLU Mean Win Rate 0.417
TruthfulQA - EM 0.59
TruthfulQA - EM (Robustness) 0.541
TruthfulQA - EM (Fairness) 0.492
TruthfulQA Mean Win Rate 0.75
BIG-bench - EM 0.330
BIG-bench Mean Win Rate 0.75
GSM8K - EM 0.443
GSM8K Mean Win Rate 0.625
BBQ - EM 0.738
BBQ Mean Win Rate 0.25
sam_sum - ROUGE-2 0.127
sam_sum - Stereotypes (race) 0.667
sam_sum - Stereotypes (gender) 0.447
sam_sum - Representation (race) 0.458
sam_sum - Representation (gender) 0.013
sam_sum Mean Win Rate 0.383
corr2cause - EM 0.615
corr2cause Mean Win Rate 0.875
MATH (chain-of-thoughts) - Equivalent (chain of thought) 0.121
MATH Mean Win Rate 0.75
ethics_justice - EM 0.68
ethics_justice - EM (Robustness) 0.645
ethics_justice - EM (Fairness) 0.62
ethics_commonsense - EM 0.41
ethics_commonsense - EM (Robustness) 0.33
ethics_commonsense - EM (Fairness) 0.345
ethics_virtue - EM 0.895
ethics_virtue - EM (Robustness) 0.865
ethics_virtue - EM (Fairness) 0.86
ethics_deontology - EM 0.63
ethics_deontology - EM (Robustness) 0.585
ethics_deontology - EM (Fairness) 0.595
ethics_utilitarianism - EM 0.72
ethics_utilitarianism - EM (Robustness) 0.6
ethics_utilitarianism - EM (Fairness) 0.645
ethics Mean Win Rate 0.55
🔥 Score_full 0.579
🔥 Score_open 0.516
🔥 Score_hidden 0.61

Top-5 Teams

Position Score
5th rank 0.362
4th rank 0.371
3rd rank 0.381
2nd rank 0.424
🔥 Ours (1st) 0.579

Refer to 4090_full_ranks.json file for scores of top-few teams that were part of final stage in competition.

Training Data Preparation

Training_Data_Prep_akjindal53244

Birbal Models and Datasets

Model Checkpoint Dataset License
Birbal-200k 🤗 Birbal-200k 200k Apache License 2.0
Birbal-400k 🤗 Birbal-400k 400k Apache License 2.0
Birbal-700k 🤗 Birbal-700k 700k Apache License 2.0

Natural Instructions Dataset Preparation

Natural Instructions dataset is a community effort to create a large collection of tasks and their natural language definitions/instructions. As show in above diagram, we sample from Natural Instructions dataset. Here is the 4-step process:

Input and Output Schema for Mistral Inference

A record from a task file from Natural Instruction data is converted into below format. orig_input field is actual input without few-shot examples. few_shot_prompt field represents a few-shot example and is passed to Mistral-7B model for prediction. answer is ground truth and prediction is output generated by Mistral-7B base model.

{
  "orig_input": "Context: I sold my $90,000.00 Mercedes G500 and bought 3 Prius's, because I got tired of being pulled over by Police. #Adapt @chrisrock\u2014 Isaiah Washington (@IWashington) April 1, 2015 Question: how many prius's did they buy? Answer: three",
  "few_shot_prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIn this task, you are given a context tweet, a question and corresponding answer of given question. Your task is to classify this question-answer pair into two categories: (1) \"yes\" if the given answer is right for question, and (2) \"no\" if the given answer is wrong for question.\n\n### Input:\nContext: Our prayers are with the students, educators & families at Independence High School & all the first responders on the scene. #PatriotPride\u2014 Doug Ducey (@dougducey) February 12, 2016 Question: at which school were first responders on the scene for? Answer: arizona high school\n\n### Response:\nno\n\n### Input:\nContext: @williebosshog huge love to you/your family huge respect for your business prosperities and the family values you still all behold. big fan\u2014 Liam Payne (@Real_Liam_Payne) January 18, 2014 Question: what was liam showing towards willy? Answer: huge respect\n\n### Response:\nyes\n\n### Input:\nContext: @williebosshog huge love to you/your family huge respect for your business prosperities and the family values you still all behold. big fan\u2014 Liam Payne (@Real_Liam_Payne) January 18, 2014 Question: what was liam showing towards willy? Answer: jealousy\n\n### Response:\nno\n\n### Input:\nContext: Our prayers are with the students, educators & families at Independence High School & all the first responders on the scene. #PatriotPride\u2014 Doug Ducey (@dougducey) February 12, 2016 Question: at which school were first responders on the scene for? Answer: independence high school\n\n### Response:\nyes\n\n### Input:\nContext: I sold my $90,000.00 Mercedes G500 and bought 3 Prius's, because I got tired of being pulled over by Police. #Adapt @chrisrock\u2014 Isaiah Washington (@IWashington) April 1, 2015 Question: how many prius's did they buy? Answer: three\n\n### Response:\n",
  "answer": [
   "yes"
  ],
  "prediction": "yes\n\n### Input:\nContext: I sold my $90,000.00 Mercedes G500 and bought 3 Pri"
}

Data Preparation Scripts

Final model training data: https://huggingface.co/datasets/upaya07/NeurIPS-LLM-data

Model Training

# clone repository
git clone git@github.com:Upaya07/NeurIPS-llm-efficiency-challenge.git
cd NeurIPS-llm-efficiency-challenge/training/axolotl

# installation
pip install packaging
pip install -e '.[flash-attn]'
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
pip install -U git+https://github.com/huggingface/peft.git

# Downloads required data and launches model fine-tuning. Runs 3-epochs on data. Script keeps track of the best checkpoint based on eval_loss.
# nips_02.yml file contains all hyperparams.
accelerate launch -m axolotl.cli.train examples/mistral/nips/nips_02.yml

Expected loss curve

W B Chart 11_25_2023, 12_39_44 PM

Team Members

Anknowledgements