Open ZhengTang1120 opened 6 months ago
Hi,
You mentioned that the model is trained on scientific papers(29B arxiv data) as a part of math component. I am wondering if you included the full articles or just math contents?
Thank you, Zheng Tang
+1
I use the arxiv dataset as a subset of the proof-pile-2 dataset (https://huggingface.co/datasets/EleutherAI/proof-pile-2)
Hi,
You mentioned that the model is trained on scientific papers(29B arxiv data) as a part of math component. I am wondering if you included the full articles or just math contents?
Thank you, Zheng Tang