All 4 of the runs highlighted in the image below are from a single accelerate launch train.py ...
It looks like the way to use wandb with Accelerator is to use the log_with argument when instantiating the accelerator: accelerator = Accelerator(log_with="wandb") and then call accelerator.init_trackers under a condition of is_main_process.
I noticed I was getting 4 wandb runs when I trained with
accelerate launch
on a machine with 4 GPUs. All 4 of the wandb runs include system statistics, like GPU temp. But only one of them includes the training/evaluation panels, because of our check foris_main_process
.All 4 of the runs highlighted in the image below are from a single
accelerate launch train.py ...
It looks like the way to use wandb with Accelerator is to use the
log_with
argument when instantiating the accelerator:accelerator = Accelerator(log_with="wandb")
and then callaccelerator.init_trackers
under a condition ofis_main_process
.This issue has some discussion on the 🤗 forums: https://discuss.huggingface.co/t/multiple-wandb-outputs/21394