Open JohnnyRacer opened 2 years ago
It allows for a very pain-free experience when using multi-GPU training compared to native PyTorch's solutions.
Unfortunately it brings a huge amount of pain if you try to train with ddp on a machine with Multiple Nvidia GPU and AMD Ryzen/Epic CPU =( And if on local machine you can disable IOMMU in bios and bypass issue, on rental cloud VM it's impossible =(
@Desm0nt I've not heard about this before, any ideas why this happens? I've trained Dreambooth using DDP using the repo linked above on multi Ampere based GPUs in one machine and have not experienced this problem using Intel based CPUs.
@Desm0nt I've not heard about this before, any ideas why this happens? I've trained Dreambooth using DDP using the repo linked above on multi Ampere based GPUs in one machine and have not experienced this problem using Intel based CPUs.
How did you train Dreambooth using DDP? Could you give some steps? thanks.
Is there a way to add training for Dreambooth / TI / Hypernetwork training with PyTorch Lightning's trainer class using DDP strategy as featured in @XavierXiao's repo. It allows for a very pain-free experience when using multi-GPU training compared to native PyTorch's solutions. Correct me if I'm wrong but from what I've gathered there isn't a clean way to do this type of training with the code that's available now. If anyone has more information about how to do proper multi-GPU training please feel free to chime in.
Code Here by XavierXiao