Open Ctrl-C-V4ever opened 3 months ago
BTW, I experience a super long waiting time (~10 min) to load the first batch of data. Is this normal? Please let me know. Thanks in advance!
Hi, for SFT for editing, we use 16 A100-40G GPUs, with a total batch size 320. Since we do not experiment with different hyperparameters for sft, the default hyperparameters may not be the optimal.
At the beginning of training, it will take a few minutes to load the pre-trained model. If a super long waiting time happens due to loading data, it may not be normal.
Is there any recipe (e.g., learning rate) for doing instruction tuning on a relatively smaller dataset (e.g., Seed-X-PPT?)? I'm trying to train on my custom dataset, but the model quickly overfits to the data in a few hundred iterations;
Thanks for opening source this excellent work! I hope to learn more detail about the instruction finetuning steps. For example for editting task, are the default hyperparameters in
train_seed_x_sft_edit.sh
good enough? What is the total batch size? How much computational resources are required? Thanks a lot for your clarification!