MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

Overview

Welcome to the repository of MAPO, our cutting-edge framework designed to revolutionize multilingual reasoning capabilities in large language models (LLMs).

🚀 We propose a framework that enhances the reasoning multilingual reasoning capabilities by aligning reasoning processes of other languages with those of English. We use off-the-shelf translation models to estimate the alignment of reasoning processes in other languages, and then optimize this alignment as a preference using popular preference optimization methods such as DPO or PPO.
📈 By utilizing our framework, you can effectively improve the consistency of multilingual reasoning, thereby enhancing the multilingual reasoning capabilities of large models in a more generalizable manner. Our approach has achieved impressive performance improvements, surpassing all baselines, including ChatGPT, and has reached state-of-the-art (SOTA) results.
🌐 Overall, our method demonstrates a novel way of improving the multilingual reasoning abilities of models without the need for extensive annotation of reasoning processes in other languages, enabling a more generalizable enhancement of multilingual reasoning capabilities.

:trophy: Benchmarks

Below is the average accuracy across ten languages on three multilingual mathematical reasoning datasets . Our method significantly improves the multilingual reasoning capabilities of LLMs by a large margin, achieving the SOTA performance. We also hope that in the future, more multilingual reasoning LLMs can collaborate with our work to further enhance multilingual reasoning capabilities.

System	MSVAMP	MGSM	MNumGLUESub
GPT-3.5-Turbo	46.6	42.2	49.4
MAmmoTH 7B	26.3	21.3	24.2
WizardMath 7B	32.5	23.0	28.7
MetaMath 7B	46.2	37.0	43.2
QAlign 7B	57.2	49.6	-
MathOctopus 7B	41.2	39.5	37.1
+ MAPO-DPO(ours)🔥	57.4	41.6	50.4
MetaMathOctopus 7B	53.0	45.5	39.2
+ MAPO-DPO(ours) 👑	64.7	51.6	52.9
MistralMathOctopus 7B	59.0	58.0	56.8
+ MAPO-DPO(ours) 👑	74.6	67.3	70.0

System	MSVAMP	MGSM	MNumGLUESub
GPT-3.5-Turbo	46.6	42.2	49.4
MAmmoTH 13B	38.6	28.9	29.5
WizardMath 13B	35.7	28.3	29.0
MetaMath 13B	46.2	43.9	43.3
QAlign 13B	62.6	57.1	-
MathOctopus 13B	51.8	46.0	40.3
+ MAPO-DPO(ours)🔥	60.1	48.5	53.8
MetaMathOctopus 13B	56.3	51.4	49.5
+ MAPO-DPO(ours) 👑	67.0	58.0	59.8

:trophy: Alignment Performance

Alt text for image 1 Alt text for image 2

We report PPL-based alignment score (left) and ACR (right), respectively assessing the consistency of the reasoning process and the reasoning answer. MAPO achieves significant improvements in the consistency of both the reasoning processes and the reasoning answers of LLM across various languages.

:hammer_and_wrench: Training & Evaluation

Preference optimization data preparation
- Generation: bash sampling.sh
- Preference estimation: bash PreferenceEstimate.sh
- Format paired data: python3 extract_dpo_data.py
Training:
- DPO: bash dpo.sh/dpo13b.sh yourconfig.json
- PPO: bash ppo_lora.sh yourconfig.json
Evaluation: bash run.sh

For more details about training/evaluating, please navigate to the Alignment/Evaluation directory.

Citation

If you find this repository helpful, feel free to cite our paper:

@misc{she2024mapo,
      title={MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization}, 
      author={Shuaijie She and Wei Zou and Shujian Huang and Wenhao Zhu and Xiang Liu and Xiang Geng and Jiajun Chen},
      year={2024},
      eprint={2401.06838},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}