Open lpyhdzx opened 1 year ago
Hi,
The results in the paper have not been updated yet, and those are our first version results with models training with math_data.json
. The math_data.json
is our first version of training data. It is a mixture of multiple math datasets with rationale derived from the log of Zero-Shot CoT. In the first version, we used 80% of the original test samples as the training set and 20% as the test set.
Moreover, we have collected and updated the math_10k.json
, which is from the training set of GSM8K, MAWPS, MAWPS-Single, and AQuA. The test sets of all datasets are the same as the original ones. Thus, the results in the table are all trained with math_10k.json
and evaluated on the original test set, e.g., 1319 samples for GSM8K.
Thus, please use the results on GitHub for now, we will update the paper soon. Please let me know if you have further questions!
Hi! Thanks for the great work! Very curious about how you collect the new training data math_10k.json. Use ChatGPT? Could you provide more information on this?
Hi! Thanks for the great work! Very curious about how you collect the new training data math_10k.json. Use ChatGPT? Could you provide more information on this?
Hi, the new training data math_10k.json is collected with ChatGPT. If you need, we can upload the data collection code.
Hi! Thanks for the great work! Very curious about how you collect the new training data math_10k.json. Use ChatGPT? Could you provide more information on this?
Hi, the new training data math_10k.json is collected with ChatGPT. If you need, we can upload the data collection code.
That would be great :)
Very good work. I was looking at the paper and noticed more inconsistencies between the results in Table 1 and the fine-tuning results in github, for example, llama-7B has gsm8k of 21.9 in the paper but the result in the link is 30.9. Is this due to the inconsistent setting?
Please forgive me if I misunderstood.