Question about the train and eval dataset for reporting the `Finetuned Result` table

ToheartZhang commented 1 year ago

Thanks for your inspiring work!

I have a question about your model's training and evaluation datasets to get the Finetuned Result.

Aftering reading the paper and repo, I believe the training set is the math_data.json, which is the mixture of multiple math datasets with rationale derived from the log of Zero-Shot CoT. Is my understanding correct? If so, why use only ~3000 examples since GSM8k contains more training examples?

Besides, what is the evaluation dataset for reporting the result? The 816 examples mentioned in the paper or the separate test dataset produced by the dataset constructors? I believe the test set is from the original split since I find that there are ~1k examples in gsm8k/test.json. Is my understanding correct?

I'm sorry if I missed something. Thank you so much for your assistance.

HZQ950419 commented 1 year ago

Hi, The math_data.json is our first version of training data. It is a mixture of multiple math datasets with rationale derived from the log of Zero-Shot CoT. In the first version, we used 80% of the original test samples as the training set and 20% as the test set. Moreover, we have collected and updated the math_10k.json, which is from the training set of GSM8K, MAWPS, MAWPS-Single, and AQuA. The test sets of all datasets are the same as the original ones. Thus, the results in the table are all trained with math_10k.json and evaluated on the original test set, e.g., 1319 samples for GSM8K.

Please let me know if you have further questions.

ToheartZhang commented 1 year ago

Thank you for your detailed explanation! I have fully understood the setting.

AGI-Edgerunners / LLM-Adapters

Question about the train and eval dataset for reporting the `Finetuned Result` table #20