TencentARC / PhotoMaker

PhotoMaker
https://photo-maker.github.io/
Other
8.63k stars 675 forks source link

How app. py performs parallel computing through multiple GPUs #113

Open Alexkerl opened 5 months ago

Alexkerl commented 5 months ago

I set CUDA_VISIBLE_DEVICES=0,1,2,3 but but it only calculates on single GPU

Paper99 commented 5 months ago

https://huggingface.co/docs/diffusers/training/distributed_inference

Alexkerl commented 5 months ago

https://huggingface.co/docs/diffusers/training/distributed_inference

The above tutorial may be helpful for distributed running, but if I want to run this program on a 2080ti of 4 * 12GB, I will still encounter an out of memory issue

Paper99 commented 5 months ago

Try to switch dtype to torch.bfloat16. It seems to work on cpu mode on 2080ti, which leads to lower speed.

Besides, you could refer to the official implementation on reducing memory usage: https://huggingface.co/docs/diffusers/main/en/optimization/memory

Paper99 commented 5 months ago

Using this link as a solution.