Closed mchaker closed 2 years ago
Hey, are all cards able to fit the model (>6GB of vRAM)? If so, I could implement smt like DataParallel that could allow to generate images in parallel using all GPUs (say you have 4 cards, requesting 4 images, each GPU generates 1). On the other hand, if you have 4 smaller cards (<6GB) and want to "combine" their memory, we need to split the model across GPUs, which should be harder to do (I believe it doesn't come out of the box).
My current cards are 16GB VRAM (HBM2) each, so that works.
Although for the feature itself I'd imagine many more users have smaller cards that they want to combine (so, model splitting)
Perhaps the first approach can be implemented first, then the second approach can come later since it's more complicated?
yeah I'm on it
Hey, I've reached a decent stage on this branch, but I am currently unable to test it on multiple GPUs. Would you mind giving it a try on your machine? The updated command is the following:
docker run --name stable-diffusion --gpus all -it -e DEVICES=all -e TOKEN=<YOUR_TOKEN> -p 7860:7860 nicklucche/stable-diffusion:multi-gpu
make sure increase the Number of Images
to have parallel image generation. Thanks for your help!
I don't mind, glad to help! I will report back with output.
Excellent! It works :)
Do the output images save anywhere on the filesystem? Or are they just directly sent back to the browser and not saved anywhere?
Is the multi-gpu
docker image based on this branch?: https://github.com/NickLucche/stable-diffusion-nvidia-docker/tree/dp
I am going to try this on my mining rig. there's 5 GPUs in there with different Ram amounts, tallying up to 32 gb vram. So it should be a good test on how it'd perform with that. I might do a test later on by putting all the GPUs I got laying around in one machine so I have around 64gb vram.
It splits the workload, not combining the capacity, at this point.
Sorry for the late reply, thanks a lot for your help, I'm glad it works!
Do the output images save anywhere on the filesystem? Or are they just directly sent back to the browser and not saved anywhere?
Atm they're only sent to the browser, you can save em with a right click; we could be saving them on disk, but we'd need to mount a volume as a prerequisite for that.
Is the multi-gpu docker image based on this branch?: https://github.com/NickLucche/stable-diffusion-nvidia-docker/tree/dp
Yep, I'll merge it into the main branch soon.
I am going to try this on my mining rig. there's 5 GPUs in there with different Ram amounts, tallying up to 32 gb vram. So it should be a good test on how it'd perform with that. I might do a test later on by putting all the GPUs I got laying around in one machine so I have around 64gb vram.
That would be a nice test bed indeed, I didn't think about scaling to this extent to be honest! Also, atm I've only implemented the "workload splitting" as @mchaker said, so if you generate X images and have N gpu each card gets about ~X/N tasks (every card must be able to fit the entirity of the model tho). I might be looking into splitting the model on multiple GPUs, so that each one can contribute to the generation of a single image, but that is going to be a bit harder as I have to figure out the best/more balanced way to split the network architecture
If you do split the model across GPUs, I would test that in a heartbeat. I really want combined GPUs for pooled VRAM. :)
Great thanks, at this point I think we can close this issue and make another one for the pooled VRAM!
Thank you so much for your work containerizing this.
I must ask, is multi-GPU support planned soon? I have 4 to 8 cards that I would like to use at once.
Thanks! :)