Turning Dust into Gold [AAAI 2024]
This is the repo for AAAI 2024 paper: Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by
Leveraging Negative Data. [Arxiv]
The repo contains:
- The synthetic data from ChatGPT and GPT4.
- The training and inference code for this work.
- The experimental results.
- Current works related to MATH dataset and math reasoning.
Data
We provide the synthetic samples from GPT3.5-turbo/GPT4 through ICL on the MATH training set, which are saved in the data folder GPT3.5-turbo-MATH and GPT4-MATH.
For each sample, 8 samples are generated.
The demonstrations for generating rationales are in our paper.
Code
The training and inference code are as follows:
step1:
prepare llama-7b checkpoint and store it in the code directory
step2:
prepare conda environment with requirements.txt
step3:
conda activate llm
step4:
training LoRA-neg
cd code
bash run_neg.sh
step5:
training LoRA-NAT
bash run_NAT.sh
step6:
training NCE
bash run_NCE.sh
step7:
training ASC
bash run_ASC.sh
Results
A list of work related to MATH and math reasoning
We have also organized some work related to the MATH dataset and mathematical reasoning tasks to promote future research
A. Involves distillation on mathematical reasoning tasks
B. Experiment on the MATH dataset
4. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models(Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Wayne Xin Zhao, Ji-Rong Wen)
dataset:MATH、HotPotQA
5.Deductive Verification of Chain-of-Thought Reasoning(Zhan Ling, Yunhao Fang, Xuanlin Li, Zhiao Huang, Mingu Lee, Roland Memisevic, Hao Su)
dataset:MATH
6.CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation(Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji)
dataset:MATH、TabMWP、Creation Challenge
7.An Empirical Study on Challenging Math Problem Solving with GPT-4 (Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang)
8.Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference (Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah)
C. Research work related to MATH
(Drawing on the MATH dataset, propose miniF2F)
( MATH is only used as a source of informal data, a way to map informal proofs to formal proofs)
(The reference is the post pretrain method in MATH, reverse reasoning)
(MATH is part of the benchmark, AGIEval: A Human-Centric Benchmark for Evaluating Base Models)