Thank you for your open source project!
The script for the finetune part corresponding to the 1600 pretrain in your provided scripts is different from the configuration given in the appendix of the paper:
1.The total batchsize in 512 (8 batch size 8 node 8 GPU)in paper,but 256((2 batch size 2 num_sample 8 node * 8 GPU))in script.
2.The training epoch was reduced from 75 rounds in the paper to 35 rounds.
Would it be possible to achieve similar training results with this difference?
Thank you for your open source project! The script for the finetune part corresponding to the 1600 pretrain in your provided scripts is different from the configuration given in the appendix of the paper: 1.The total batchsize in 512 (8 batch size 8 node 8 GPU)in paper,but 256((2 batch size 2 num_sample 8 node * 8 GPU))in script. 2.The training epoch was reduced from 75 rounds in the paper to 35 rounds. Would it be possible to achieve similar training results with this difference?