Horovod broadcast - Githubissues

Thanks for the impressive work! I have some questions about Horovod. How to ensure all ranks initialized with the same weight? I don't find the process just like calling function "hvd.BroadcastGlobalVariablesHook". It would be very kind of you if you can help me here.

The following is the code for Horovod in ./packnet_sfm/trainers/horovod_trainer.py : ` def fit(self, module):

    # Prepare module for training
    module.trainer = self
    # Update and print module configuration
    prep_logger_and_checkpoint(module)
    print_config(module.config)

    # Send module to GPU
    module = module.to('cuda')
    # Configure optimizer and scheduler
    module.configure_optimizers()

    # Create distributed optimizer
    compression = hvd.Compression.none
    optimizer = hvd.DistributedOptimizer(module.optimizer,
        named_parameters=module.named_parameters(), compression=compression)
    scheduler = module.scheduler

    # Get train and val dataloaders
    train_dataloader = module.train_dataloader()
    val_dataloaders = module.val_dataloader()

    # Validate before training if requested
    if self.validate_first:
        validation_output = self.validate(val_dataloaders, module)
        self.check_and_save(module, validation_output)

TRI-ML / packnet-sfm

Horovod broadcast #172