Official repository for the EMNLP Findings 2024 paper Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models.
Model | Paper | Training data | Evaluation Data
torchrun --nnodes=1 --nproc_per_node=4 --master_port=29506 \
train_openrag_moe.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--data_path shayekh/openrag_train_data --data_subset moe \
--output_dir ./checkpoints/ \
--bf16 True --tf32 True --fp16 False \
--model_max_length 4096 \
--num_train_epochs 2 --gradient_accumulation_steps 8 \
--per_device_train_batch_size 4 \
--evaluation_strategy "no" --save_strategy "epoch" \
--logging_strategy "steps" --report_to tensorboard --logging_steps 1 \
--learning_rate 2e-4 --adam_beta2 0.999 \
--lr_scheduler_type constant_with_warmup \
--max_grad_norm 0.3 --weight_decay 0.0 --warmup_steps 200 \
--adapter_dim 512 --moe_scaling 0.25 --num_experts 8 --topk 2
torchrun --nnodes=1 --nproc_per_node=4 --master_port=29506 \
train_openrag_moe.py \
--model_name_or_path meta-llama/Llama-2-13b-hf \
--data_path shayekh/openrag_train_data --data_subset moe \
--output_dir ./checkpoints/ \
--bf16 True --tf32 True --fp16 False \
--model_max_length 4096 \
--num_train_epochs 2 --gradient_accumulation_steps 8 \
--per_device_train_batch_size 4 \
--evaluation_strategy "no" --save_strategy "epoch" \
--logging_strategy "steps" --report_to tensorboard --logging_steps 1 \
--learning_rate 1e-4 --adam_beta2 0.999 \
--lr_scheduler_type constant_with_warmup \
--max_grad_norm 0.3 --weight_decay 0.0 --warmup_steps 200 \
--adapter_dim 512 --moe_scaling 0.25 --num_experts 8 --topk 2
python merge_moe_lora.py --base_model "meta-llama/Llama-2-7b-hf" \
--model_path "./checkpoints"
python run_short_form_moe_hotpot.py \
--model_name ./checkpoints/merged/ \
--world_size 1 --w_use 0.5 \
--dataset shayekh/openrag_bench --task hotpotqa \
--mode adaptive_retrieval --max_new_tokens 100 \
--threshold 0.0 --mode adaptive_retrieval \
--metric hotpotem --ndocs 3 --use_groundness --use_utility --use_seqscore \
--output_file ./eval/hotpotqa.jsonl
Tasks: 2wikimultihopqa
, hotpotqa
and musique
We are grateful to the works Self-RAG, Parameter-Efficient Sparsity Crafting, and Beam Retrieval, especially for open-sourcing their artifacts.
@inproceedings{islam-etal-2024-open,
title = "Open-{RAG}: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models",
author = "Islam, Shayekh Bin and
Rahman, Md Asib and
Hossain, K S M Tozammel and
Hoque, Enamul and
Joty, Shafiq and
Parvez, Md Rizwan",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-emnlp.831",
pages = "14231--14244",
abstract = "Retrieval Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs) by providing external evidence, but existing methods often suffer from limited reasoning capabilities (e.g., multi-hop complexities) in effectively using such evidence, particularly when using open-source LLMs. To mitigate this gap, in this paper, we introduce a novel framework, **Open-RAG**, designed to enhance reasoning capabilities in RAG with open-source LLMs. Our framework transforms an arbitrary dense LLM into a parameter-efficient sparse mixture of experts (MoE) model capable of handling complex reasoning tasks, including both single- and multi-hop queries. Open-RAG uniquely trains the model to navigate challenging distractors that appear relevant but are misleading. By combining the constructive learning and architectural transformation, Open-RAG leverages latent learning, dynamically selecting relevant experts and integrating external knowledge effectively for more accurate and contextually relevant responses. Additionally, we propose a hybrid adaptive retrieval method to determine retrieval necessity and balance the trade-off between performance gain and inference speed. Experimental results show that Open-RAG outperforms state-of-the-art LLMs and RAG models in various knowledge-intensive tasks. Our method based on Llama2-7B sets new benchmarks, surpassing ChatGPT-RAG and Self-RAG. For example, in multi-hop HotpotQA, it achieves an EM score of 63.3, compared to RAG 2.0{'}s 54 and Command R+{'}s 60.",
}