VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
970
stars
68
forks
source link
About VILADistributedSampler and gradient_accumulation_steps #69
Open
dreamerlin opened 1 month ago
If we use the VILADistributedSampler (https://github.com/Efficient-Large-Model/VILA/blob/main/llava/train/llava_trainer.py#L274-L281) for Distributed Training, should the
gradient_accumulation_steps
be hardcoded to 1? Since I notice that when I use 4 nodes (8 GPUs per node) to training, and set gradient_accumulation_steps to 8, the speed is fast but I think it is abnormal.