ConiferLM / Conifer

Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
73 stars 4 forks source link

Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models

This repository contains the source code for reproducing the Conifer dataset.

🤗 HF Repo    📄 Paper   

Data

conifer_data_samples/ contains some dummy instances of Conifer dataset with the instances of each steps. The full Conifer dataset is release on the HuggingFace 🤗: Conifer.

Model

We fine-tuned Mistral-7B and LLaMA-2-13B based on the combination of Conifer dataset and ShareGPT-53k data, using the alignment-handbook and FastChat.

Conifer Dataset Construction

Conifer

0. Prepare the seed instruction.

1. Conifer Dataset Construction (without external process feedback)

2. Conifer with External Process Feedback

3. Conifer with Easy-to-Hard Progression

NOTE:

  1. We provide examples of each step under ./conifer_data_samples; the Conifer data you produce should be the same as the examples.
  2. The target directory can be changed by using the --dir option in each step; the default is conifer_data.
  3. Remember to check the intermediate results to ensure your money is not wasted.

Training

We use the alignment-handbook to train our Mistral-7B based models, and use the FastChat to train our LLaMA-2-13B based models. You can find guidance from their respective github repos.

Evaluation

We have listed the evaluation benchmarks that we used in our paper. Except for IFEval, the other benchmarks utilize the GPT-4 API to obtain results.

IFEval

FollowBench

InFoBench

AlpacaEval

MT-Bench

Performance

Supervised Fine-Tuned (SFT) Models

- Final Stage Base Model IFEval FollowBench Avg FollowBench Hard (L4-L5) InFoBench AlpacaEval LC Win Rate MT-Bench
Deita-7B-v1.0-SFT SFT Mistral-7B 45.1 42.0 31.6 78.6 - 7.22
Evol-Instruct-7B-SFT SFT Mistral-7B 44.0 40.7 27.6 75.6 9.4% 6.51
ShareGPT-7B-SFT SFT Mistral-7B 43.3 42.9 32.3 78.5 11.6% 6.86
Conifer-7B-SFT SFT Mistral-7B 50.8 44.9 35.7 79.5 12.5% 7.08

DPO/RLHF Models

- Final Stage Base Model IFEval FollowBench Avg FollowBench Hard (L4-L5) InFoBench AlpacaEval LC Win Rate MT-Bench
LLaMA-2-70B-Chat RLHF LLaMA-2-70B - 47.5 39.0 84.4 14.7% 6.86
Zephyr-7B-beta DPO Mistral-7B 44.9 44.8 36.4 78.0 13.2% 7.34
Deita-7B-v1.0 DPO Mistral-7B 51.9 45.7 38.5 80.9 16.1% 7.55
ShareGPT-7B-DPO DPO Mistral-7B 48.2 47.7 38.9 82.0 15.1% 7.10
Conifer-7B-DPO DPO Mistral-7B 52.3 50.0 44.1 82.3 17.1% 7.25

Citation

If you find the content of this project helpful, please cite our paper as follows:

@article{
  sun2024conifer,
  title={Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models},
  author={Haoran Sun and Lixin Liu and Junjie Li and Fengyu Wang and Baohua Dong and Ran Lin and Ruohui Huang},
  journal={arxiv preprint arXiv:2404.02823},
  year={2024},
  url={https://arxiv.org/abs/2404.02823}
}