Thanks for the impressive work!
I have some questions about Horovod. How to ensure all ranks initialized with the same weight? I don't find the process just like calling function "hvd.BroadcastGlobalVariablesHook".
It would be very kind of you if you can help me here.
The following is the code for Horovod in ./packnet_sfm/trainers/horovod_trainer.py :
` def fit(self, module):
# Prepare module for training
module.trainer = self
# Update and print module configuration
prep_logger_and_checkpoint(module)
print_config(module.config)
# Send module to GPU
module = module.to('cuda')
# Configure optimizer and scheduler
module.configure_optimizers()
# Create distributed optimizer
compression = hvd.Compression.none
optimizer = hvd.DistributedOptimizer(module.optimizer,
named_parameters=module.named_parameters(), compression=compression)
scheduler = module.scheduler
# Get train and val dataloaders
train_dataloader = module.train_dataloader()
val_dataloaders = module.val_dataloader()
# Validate before training if requested
if self.validate_first:
validation_output = self.validate(val_dataloaders, module)
self.check_and_save(module, validation_output)
Thanks for the impressive work! I have some questions about Horovod. How to ensure all ranks initialized with the same weight? I don't find the process just like calling function "hvd.BroadcastGlobalVariablesHook". It would be very kind of you if you can help me here.
The following is the code for Horovod in ./packnet_sfm/trainers/horovod_trainer.py : ` def fit(self, module):
`