ShayekhBinIslam / openrag

Official Code for Oᴘᴇɴ-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models (EMNLP Findings 2024)
https://openragmoe.github.io/
Creative Commons Attribution 4.0 International
79 stars 10 forks source link

Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models

Official repository for the EMNLP Findings 2024 paper Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models.

Model | Paper | Training data | Evaluation Data

Training

OpenRAG-7B-8x135M

torchrun --nnodes=1 --nproc_per_node=4 --master_port=29506 \
  train_openrag_moe.py \
  --model_name_or_path meta-llama/Llama-2-7b-hf \
  --data_path shayekh/openrag_train_data --data_subset moe \
  --output_dir ./checkpoints/ \
  --bf16 True --tf32 True --fp16 False \
  --model_max_length 4096 \
  --num_train_epochs 2 --gradient_accumulation_steps 8 \
  --per_device_train_batch_size 4 \
  --evaluation_strategy "no" --save_strategy "epoch" \
  --logging_strategy "steps" --report_to tensorboard --logging_steps 1 \
  --learning_rate 2e-4 --adam_beta2 0.999 \
  --lr_scheduler_type constant_with_warmup \
  --max_grad_norm 0.3 --weight_decay 0.0 --warmup_steps 200 \
  --adapter_dim 512 --moe_scaling 0.25 --num_experts 8 --topk 2

OpenRAG-13B-8x213M

torchrun --nnodes=1 --nproc_per_node=4 --master_port=29506 \
  train_openrag_moe.py \
  --model_name_or_path meta-llama/Llama-2-13b-hf \
  --data_path shayekh/openrag_train_data --data_subset moe \
  --output_dir ./checkpoints/ \
  --bf16 True --tf32 True --fp16 False \
  --model_max_length 4096 \
  --num_train_epochs 2 --gradient_accumulation_steps 8 \
  --per_device_train_batch_size 4 \
  --evaluation_strategy "no" --save_strategy "epoch" \
  --logging_strategy "steps" --report_to tensorboard --logging_steps 1 \
  --learning_rate 1e-4 --adam_beta2 0.999 \
  --lr_scheduler_type constant_with_warmup \
  --max_grad_norm 0.3 --weight_decay 0.0 --warmup_steps 200 \
  --adapter_dim 512 --moe_scaling 0.25 --num_experts 8 --topk 2

Evaluation

Merge Expert Weights into the Base Model

python merge_moe_lora.py --base_model "meta-llama/Llama-2-7b-hf" \
  --model_path "./checkpoints"

Multi-Hop QA

python run_short_form_moe_hotpot.py \
  --model_name ./checkpoints/merged/ \
  --world_size 1 --w_use 0.5 \
  --dataset shayekh/openrag_bench --task hotpotqa \
  --mode adaptive_retrieval --max_new_tokens 100 \
  --threshold 0.0 --mode adaptive_retrieval \
  --metric hotpotem --ndocs 3 --use_groundness --use_utility --use_seqscore \
  --output_file ./eval/hotpotqa.jsonl

Tasks: 2wikimultihopqa, hotpotqa and musique

Acknowledgement

We are grateful to the works Self-RAG, Parameter-Efficient Sparsity Crafting, and Beam Retrieval, especially for open-sourcing their artifacts.

Citation

@inproceedings{islam-etal-2024-open,
    title = "Open-{RAG}: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models",
    author = "Islam, Shayekh Bin  and
      Rahman, Md Asib  and
      Hossain, K S M Tozammel  and
      Hoque, Enamul  and
      Joty, Shafiq  and
      Parvez, Md Rizwan",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.831",
    pages = "14231--14244",
    abstract = "Retrieval Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs) by providing external evidence, but existing methods often suffer from limited reasoning capabilities (e.g., multi-hop complexities) in effectively using such evidence, particularly when using open-source LLMs. To mitigate this gap, in this paper, we introduce a novel framework, **Open-RAG**, designed to enhance reasoning capabilities in RAG with open-source LLMs. Our framework transforms an arbitrary dense LLM into a parameter-efficient sparse mixture of experts (MoE) model capable of handling complex reasoning tasks, including both single- and multi-hop queries. Open-RAG uniquely trains the model to navigate challenging distractors that appear relevant but are misleading. By combining the constructive learning and architectural transformation, Open-RAG leverages latent learning, dynamically selecting relevant experts and integrating external knowledge effectively for more accurate and contextually relevant responses. Additionally, we propose a hybrid adaptive retrieval method to determine retrieval necessity and balance the trade-off between performance gain and inference speed. Experimental results show that Open-RAG outperforms state-of-the-art LLMs and RAG models in various knowledge-intensive tasks. Our method based on Llama2-7B sets new benchmarks, surpassing ChatGPT-RAG and Self-RAG. For example, in multi-hop HotpotQA, it achieves an EM score of 63.3, compared to RAG 2.0{'}s 54 and Command R+{'}s 60.",
}