AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
https://arxiv.org/abs/2304.01933
Apache License 2.0
1.05k stars 99 forks source link

Question regarding the source of math_10k.json #43

Open HuangOwen opened 11 months ago

HuangOwen commented 11 months ago

Hi, thanks for the good work!

I have a question regarding the math_10k.json, which is used for finetuning. You mentioned in the paper that ''To enhance the diversity of our data, we incorporate the training sets from GSM8K, MAWPS, MAWPS-single'', but there is no training set for MAWPS to the best of my knowledge. When I checked the samples from math_10k.json, I found that there are some question-answer that are exactly the same as the test set of AddSub/MultiArith/SingleEq. Could you please further elaborate on this?

LYH-YF commented 11 months ago

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

HuangOwen commented 11 months ago

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{ "instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ", "input": "", "output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.", "answer": "15.0" }

could be located in the math_10k.json. Could you please further elaborate on this?

HZQ950419 commented 11 months ago

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{ "instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ", "input": "", "output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.", "answer": "15.0" }

could be located in the math_10k.json. Could you please further elaborate on this?

Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset.

Please let us know if you have further questions!

HuangOwen commented 10 months ago

Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq).

HuangOwen commented 10 months ago

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{ "instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ", "input": "", "output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.", "answer": "15.0" }

could be located in the math_10k.json. Could you please further elaborate on this?

Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset.

Please let us know if you have further questions!

I think this data leak issue has nothing to do with the way authors create MultiArith dataset as MultiArith is proposed in 2015 and included in MAWPS in 2016, which are before the MWPToolkit is proposed.

callanwu commented 10 months ago

mark

HZQ950419 commented 10 months ago

Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq).

Hi,

Many thanks for your questions!

After careful double-checking, there is a data leak issue with the math reasoning experiments. We tried our best to salvage the impact of this data leak. We use the MAWPS test set to evaluate the performance of PEFT methods and the result table has been updated. The findings in the paper are still consistent. And we made a special announcement for researchers who are using our repository for their experiments. Furthermore, we also upload two variations of math_10k.json where the MAWPS samples are deleted.

Sincerely apologize for any inconvenience caused by our mistake! If you have any questions, please let us know! Many thanks!

HuangOwen commented 10 months ago

Hi Zhiqiang,

Thanks for your reply and your effort in fixing the problem! Glad that the dataset has been updated.

Yuan0320 commented 9 months ago

Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)?

HZQ950419 commented 9 months ago

Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)?

Hi @Yuan0320,

Correct! We will upload the test set later, or you can also get the test set from MWPToolkit.