Question about Multi-gpu training.

Hi, I trained on eight 48G gpus, using the following training script files:

export PYTHONPATH=.

accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
pipeline/train/instruction_following.py \
--pretrained_model_name_or_path=luodian/OTTER-LLaMA7B-INIT  \ # or --pretrained_model_name_or_path=luodian/OTTER-MPT7B-Init
--mimicit_path="path/to/DC_instruction.json" \
--images_path="path/to/DC.json" \
--train_config_path="path/to/DC_train.json" \
--batch_size=32 \
--num_epochs=3 \
--report_to_wandb \
--wandb_entity=ntu-slab \
--run_name=OTTER-LLaMA7B-densecaption \
--wandb_project=OTTER-LLaMA7B \
--workers=4 \
--lr_scheduler=cosine \
--learning_rate=1e-5 \
--warmup_steps_ratio=0.01

and I didn't change anything in this file "./pipeline/accelerate_configs/accelerate_config_fsdp.yaml”. When I trained, I found that except for the last gpu, the utilization rate of other Gpus was very low. Is this phenomenon normal? 1692100838853 1692100848576

Luodian / Otter

Question about Multi-gpu training. #242