TIGER-AI-Lab / MAmmoTH2

Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
https://tiger-ai-lab.github.io/MAmmoTH2/
MIT License
119 stars 9 forks source link

Question about the continued instruction-tuning phase #6

Open Jiaxin-Wen opened 4 months ago

Jiaxin-Wen commented 4 months ago

In section 2.5, models are continued fine-tuned on several opensource instruction tuning datasets, which includes the training set of GSM8K and MATH.

I'm wondering after continued fine-tuning, are models evaluated still with few-shot prompting or zero-shot prompting. For example, if the model is fine-tuned on GSM8K with the following data format:

Question:\n{question}\nAnswer:\n{answer} In the inference stage, do you still incorporate multiple Q-A pairs into the input, or just the question (which is aligned with the continued fine-tuning stage).
Jiaxin-Wen commented 4 months ago

Moreover, I find that simply fine-tuning SOTA LMs (e.g., llama-3-8b) on the original training set of GSM8K does not lead to any improvement compared with few-shot performance.

llama-3-8b GSM8K
few-shot prompting 55.57
fine-tuning 55.79

I would like to know if this aligns with your experiment results. If so, could you please share your data for continued instruction-tuning too? That would be really helpful to reproduce the experiment results in this paper.

xiangyue9607 commented 4 months ago

Thanks! We used few-shot for the evaluation. You can find more details in our evaluation code and implementation details in the paper.

Jiaxin-Wen commented 4 months ago

will you add eos token during pre-training or continue fine-tuning?

Jiaxin-Wen commented 4 months ago

as all training data is in one-shot format, I'm wondering whether I should remove eos token during pre-training or fine-tuning to adapt to few-shot evaluation

Jiaxin-Wen commented 4 months ago

or is there any other trick that you used to adapt to few-shot evaluation?