On a side note, the various documentation sources do not really explain how to use Horovod + Lightning in a way that works.
Lightning documentation refer to this repo (not easy to find). This repo refers to Horovod docs. Horovod docs don't mention this repo, but say pl.Trainer(accelerator='horovod'), or pl.Trainer(distributed_backend='horovod'), neither of which work. The README says trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1), but that doesn't work either. I ended up using the CPU example of strategy=HorovodStrategy(), but then also specifying accelerator='gpu'.
🐛 Bug
Trainer.accumulation_scheduler
does not exist, which makes the strategy code fail.To Reproduce
Steps to reproduce the behavior:
Environment
conda
,pip
, source): pipOther info
On a side note, the various documentation sources do not really explain how to use Horovod + Lightning in a way that works. Lightning documentation refer to this repo (not easy to find). This repo refers to Horovod docs. Horovod docs don't mention this repo, but say
pl.Trainer(accelerator='horovod')
, orpl.Trainer(distributed_backend='horovod')
, neither of which work. The README saystrainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)
, but that doesn't work either. I ended up using the CPU example ofstrategy=HorovodStrategy()
, but then also specifyingaccelerator='gpu'
.