Open adkAurora opened 1 year ago
Hi! In DDP, model.train_num_rays
and model.max_train_num_rays
are defined per device, so you could either halving these values or simply train for fewer iterations :)
Hi! In DDP,
model.train_num_rays
andmodel.max_train_num_rays
are defined per device, so you could either halving these values or simply train for fewer iterations :)
Hi! I halved the parameters model.train_num_rays and model.max_train_num_rays on V100 graphics card. The training time for single GPU is 15 minutes, and for two GPUs it is 20 minutes. It appears that using multiple cards based on the current network structure does not bring significant time gains, is that true?
Hi! In DDP,
model.train_num_rays
andmodel.max_train_num_rays
are defined per device, so you could either halving these values or simply train for fewer iterations :)Hi! I halved the parameters model.train_num_rays and model.max_train_num_rays on V100 graphics card. The training time for single GPU is 15 minutes, and for two GPUs it is 20 minutes. It appears that using multiple cards based on the current network structure does not bring significant time gains, is that true?
Hi, have you solve this problem?
I am training Neus using two GPUs. Do I need to change any config parameters? reduce the traner max_steps in half?