NickLucche / stable-diffusion-nvidia-docker

GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
MIT License
359 stars 43 forks source link

Multi-GPU support #5

Closed mchaker closed 2 years ago

mchaker commented 2 years ago

Thank you so much for your work containerizing this.

I must ask, is multi-GPU support planned soon? I have 4 to 8 cards that I would like to use at once.

Thanks! :)

NickLucche commented 2 years ago

Hey, are all cards able to fit the model (>6GB of vRAM)? If so, I could implement smt like DataParallel that could allow to generate images in parallel using all GPUs (say you have 4 cards, requesting 4 images, each GPU generates 1). On the other hand, if you have 4 smaller cards (<6GB) and want to "combine" their memory, we need to split the model across GPUs, which should be harder to do (I believe it doesn't come out of the box).

mchaker commented 2 years ago

My current cards are 16GB VRAM (HBM2) each, so that works.

Although for the feature itself I'd imagine many more users have smaller cards that they want to combine (so, model splitting)

Perhaps the first approach can be implemented first, then the second approach can come later since it's more complicated?

NickLucche commented 2 years ago

yeah I'm on it

NickLucche commented 2 years ago

Hey, I've reached a decent stage on this branch, but I am currently unable to test it on multiple GPUs. Would you mind giving it a try on your machine? The updated command is the following:

docker run --name stable-diffusion --gpus all -it -e DEVICES=all -e TOKEN=<YOUR_TOKEN> -p 7860:7860 nicklucche/stable-diffusion:multi-gpu

make sure increase the Number of Images to have parallel image generation. Thanks for your help!

mchaker commented 2 years ago

I don't mind, glad to help! I will report back with output.

mchaker commented 2 years ago

Excellent! It works :)

Do the output images save anywhere on the filesystem? Or are they just directly sent back to the browser and not saved anywhere?

image

image

mchaker commented 2 years ago

Is the multi-gpu docker image based on this branch?: https://github.com/NickLucche/stable-diffusion-nvidia-docker/tree/dp

Lazzeruz commented 2 years ago

I am going to try this on my mining rig. there's 5 GPUs in there with different Ram amounts, tallying up to 32 gb vram. So it should be a good test on how it'd perform with that. I might do a test later on by putting all the GPUs I got laying around in one machine so I have around 64gb vram.

mchaker commented 2 years ago

It splits the workload, not combining the capacity, at this point.

NickLucche commented 2 years ago

Sorry for the late reply, thanks a lot for your help, I'm glad it works!

Do the output images save anywhere on the filesystem? Or are they just directly sent back to the browser and not saved anywhere?

Atm they're only sent to the browser, you can save em with a right click; we could be saving them on disk, but we'd need to mount a volume as a prerequisite for that.

Is the multi-gpu docker image based on this branch?: https://github.com/NickLucche/stable-diffusion-nvidia-docker/tree/dp

Yep, I'll merge it into the main branch soon.

I am going to try this on my mining rig. there's 5 GPUs in there with different Ram amounts, tallying up to 32 gb vram. So it should be a good test on how it'd perform with that. I might do a test later on by putting all the GPUs I got laying around in one machine so I have around 64gb vram.

That would be a nice test bed indeed, I didn't think about scaling to this extent to be honest! Also, atm I've only implemented the "workload splitting" as @mchaker said, so if you generate X images and have N gpu each card gets about ~X/N tasks (every card must be able to fit the entirity of the model tho). I might be looking into splitting the model on multiple GPUs, so that each one can contribute to the generation of a single image, but that is going to be a bit harder as I have to figure out the best/more balanced way to split the network architecture

mchaker commented 2 years ago

If you do split the model across GPUs, I would test that in a heartbeat. I really want combined GPUs for pooled VRAM. :)

NickLucche commented 2 years ago

Great thanks, at this point I think we can close this issue and make another one for the pooled VRAM!