Closed ToheartZhang closed 1 year ago
Hi,
The math_data.json
is our first version of training data. It is a mixture of multiple math datasets with rationale derived from the log of Zero-Shot CoT. In the first version, we used 80% of the original test samples as the training set and 20% as the test set.
Moreover, we have collected and updated the math_10k.json
, which is from the training set of GSM8K, MAWPS, MAWPS-Single, and AQuA. The test sets of all datasets are the same as the original ones. Thus, the results in the table are all trained with math_10k.json
and evaluated on the original test set, e.g., 1319 samples for GSM8K.
Please let me know if you have further questions.
Thank you for your detailed explanation! I have fully understood the setting.
Thanks for your inspiring work!
I have a question about your model's training and evaluation datasets to get the
Finetuned Result
.Aftering reading the paper and repo, I believe the training set is the
math_data.json
, which is the mixture of multiple math datasets with rationale derived from the log of Zero-Shot CoT. Is my understanding correct? If so, why use only ~3000 examples since GSM8k contains more training examples?Besides, what is the evaluation dataset for reporting the result? The
816 examples
mentioned in the paper or the separate test dataset produced by the dataset constructors? I believe the test set is from the original split since I find that there are ~1k examples ingsm8k/test.json
. Is my understanding correct?I'm sorry if I missed something. Thank you so much for your assistance.