Open webcoderz opened 1 year ago
I have working POC of the api implemented with ray serve for multi gpu inference that I've been working on https://github.com/webcoderz/stable-diffusion-webui you basically set the number of GPUs and replicas you want in the webui-user.sh script and then just launch by adding the --ray flag to the launch command, It's still a WIP but am happy to have more contributors! pinging @AUTOMATIC1111 for visibility I had to restructure the api.py a little bit because Ray wasn't working with the router so I just did individual route decorators but not much is diff here
It's set in the webui-user.sh but apparently the base repo main branch is broken so I have to rebase on an older version tonight when I get in front of my computer
I mean if you can get it working submit a PR from my experience the main branch doesn't work as nothing is initializing shared.sd-model tommorow I'm going to go back one version to 2.6(?) and confirm that working so I'll let you know how that goes tomorrow morning
Yep and that's the goal of this stuff still very much a WIP though
stable-diffusion-webui 1.6.0
RAY 2.7.1
https://docs.ray.io/en/latest/serve/tutorials/stable-diffusion.html
This is with diffusers so entirely different framework with limited features and no extension support
Yes
@webcoderz šš can you update your code with it , am using windows
I don't have a windows box to test on , it's possible to run in docker also
@webcoderz i have cluster with 14 gpus A100
I'd be interested on testing stuff there if you're open to it I only have a laptop gpu so I am limited in capabilities esp as I refine it
would
Yea should be possible I'm mostly developing this to use in docker and I've gotten it working there as well, if you want to reach out to me on Twitter @webcoderz we can discuss more
@webcoderz just let me know when you push the code so i can deploy ray cluster on cloud and share it with you
Yea I'm going to downgrade tommorow to last working version and it should work fine (in theory) so reach out and I'll let you know
@AUTOMATIC1111 i didnt change much to get this going but i had to change how the shared class in modules/shared_items dynamically loads the model, it still does the same thing but i am doing it in a more pythonic/normal way, the previous way couldnt be pickled by ray, so i replaced shared.sd_model with a instantiation of the class shared_instance.. i will research a cleaner method with less code changes than refactoring 30+ files to do this :)
i have made a PR here : https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/13668
Is there an existing issue for this?
What would your feature do ?
I think I have devised an easy path to multi gpu inference:
Considering the use of accelerate it should be easy to implement this: https://huggingface.co/docs/accelerate/usage_guides/distributed_inference in some form or other. I would recommend a txt2img batch function, and it would make perfect sense there and likewise for the img2img batch function.
we will probably need to add some functions to detect multi gpu in modules/devices.py if enabled in the start bash script we can set it to use the new txt2img batch gpu function by default even if just one prompt.
Proposed workflow
Additional information
Am open to further refinement / debate here this is just a rough guesstimate on stuff I am going to bring testing and stuff soon , would love to hear feedback @AUTOMATIC1111 and other core devs