AILab-CVC / SEED-X

Multimodal Models in Real World
Other
396 stars 16 forks source link

Details about Instruction FT #19

Open Ctrl-C-V4ever opened 3 months ago

Ctrl-C-V4ever commented 3 months ago

Thanks for opening source this excellent work! I hope to learn more detail about the instruction finetuning steps. For example for editting task, are the default hyperparameters in train_seed_x_sft_edit.sh good enough? What is the total batch size? How much computational resources are required? Thanks a lot for your clarification!

Ctrl-C-V4ever commented 3 months ago

BTW, I experience a super long waiting time (~10 min) to load the first batch of data. Is this normal? Please let me know. Thanks in advance!

geyuying commented 3 months ago

Hi, for SFT for editing, we use 16 A100-40G GPUs, with a total batch size 320. Since we do not experiment with different hyperparameters for sft, the default hyperparameters may not be the optimal.

At the beginning of training, it will take a few minutes to load the pre-trained model. If a super long waiting time happens due to loading data, it may not be normal.

naoto0804 commented 3 months ago

Is there any recipe (e.g., learning rate) for doing instruction tuning on a relatively smaller dataset (e.g., Seed-X-PPT?)? I'm trying to train on my custom dataset, but the model quickly overfits to the data in a few hundred iterations;