ModelTC / OmniBal

15 stars 0 forks source link

use_fast_dataset=True #2

Open fyting opened 2 months ago

fyting commented 2 months ago

Is the way to use OmniBal in the internvl codebase by adding the use_fast_dataset=True configuration in the bash script? For example, if you add the use_fast_dataset=True configuration in this file: https://github.com/ModelTC/InternVL/blob/OmniBal_V2.0/internvl_chat/shell/internvl1.5/hermes2_yi34b/internvl_chat_v1_5_hermes2_yi34b_dynamic_res_finetune.sh, will it accelerate training?

yqyao commented 2 months ago

Yes, you can follow this PR (https://github.com/OpenGVLab/InternVL/pull/506/files#diff-a6d78bf1713c7a9e7c1c701008ac8761ecf7d9d376f56658522ad6a2bda77016), for 6 + 20b training, we can reduce training time from 14.5h to 9.5h with 64 GPUs using vit 9 llm 4096 input. @fyting

fyting commented 2 months ago

Yes, you can follow this PR (https://github.com/OpenGVLab/InternVL/pull/506/files#diff-a6d78bf1713c7a9e7c1c701008ac8761ecf7d9d376f56658522ad6a2bda77016), for 6 + 20b training, we can reduce training time from 14.5h to 9.5h with 64 GPUs using vit 9 llm 4096 input. @fyting

Thank you for your guidance. Could you please also provide the sh script used for training?

fyting commented 2 months ago

use_fast_dataset=True

After setting use_fast_dataset=True in the config, the training process gets stuck at this point. What could be the issue? 723a37f1a24763b7326b52faa07f53ff

yqyao commented 2 months ago

Maybe you can try to insert some breakpoints (pdb) to solve your problem @fyting.