Open JohnTigue opened 1 year ago
Since the backend of this hyponwerk service is ECS Docker cluster – i.e. Docker images deployed from ECR – then fer sure the API request handler (Lambda) should be Docker images, not the old school ZIP file format from pre-2021 reInvent. This way the whole thing is Dockerized – and will definitely NOT work anywhere except AWS. But there's still tons of Docker goodness even within that AWS-only world.
A1111 wiki: API seems to be the doc for their API, which links to a Swagger doc webUI.
Looks like this Wiley E. Coyote Supergenius ("I have a beautiful vision for DFserver") ran out of steam after a month or so, but interesting to see other ideas: https://github.com/huo-ju/dfserver
Here's someone implementing the API service using FastAPI library. Dockerized too, but very much not ECS: How to Run Stable Diffusion in Docker with a Simple Web API and GPU Support
"Stability Gnerator API," you say?
Client implementations that interact with the Stability Generator API … Python client client.py is both a command line client and an API class that wraps the gRPC based API. To try the client:
The host can be set, so we could point it at AWS. I wonder if this works with A1111 in --api
mode.
STABILITY_HOST = os.getenv("STABILITY_HOST", "grpc.stability.ai:443")
STABILITY_KEY = os.getenv("STABILITY_KEY", "")
if not STABILITY_HOST:
logger.warning("STABILITY_HOST environment variable needs to be set.")
sys.exit(1)
Hypnowerk is simply a render cluster that has been architected to be the engine for all manner of machinery on AWS. Definitely want to get to where there is an Invoke session service that calls into the render cluster, but that is not a Version One thing.
All the complexity of the render cluster should be hidden behind a service API. Scaling without getting into spaghetti code is a concern.
Another reason to have all clients (Invoke, Discord, Blender, Photoshop, et cetara) using the same message queue interface to the render cluster is because SQS provides the usage flow metrics which is used to titrate the autoscaling:
To properly scale the system, a custom Amazon CloudWatch metric is created. This custom CloudWatch metric calculates the number of Amazon Elastic Container Service (Amazon ECS) tasks required to adequately handle the amount of Amazon Simple Queue Service (Amazon SQS) messages. You should have a high-resolution CloudWatch metric to scale up quickly. For this use case, a high-resolution CloudWatch metric of every 10 seconds was implemented.
SD webUI architectures are weird. I like it but it's a bit odd because it assumes one user per web server instance (with a GPU in the instance). So having a cluster of them is more like remote desk top Windows machine than a traditional webapp. But a GPU will only be busy while that one user is rendering. Very cost inefficient. This is yet another reason to get to where there are Invoke session services (on Lambda, I expect) that queue out render requests to the render cluster. This way GPU utilization can be optimized, much less wastage, autoscaling, et cetera.
One problem with SD on Docker on macOS is that you cannot use the GPUs in the Macbook. But once you can, the SD community will move to Docker. And in Dockerland you will have Docker Compose being used to set up two containers: web-front and render-engine, on the same laptop or GPU rig. When that happens, the same architecture can be deployed to a render cluster in a cloud provider.
Point is the odd architecture of early SD webui is stuck in a pre-Docker evolution, or we are on the cusp of a split. That is what the AWS SD Docker bot project demonstrated as do the cost concerns I'm running into with low GPU utilization with these early SD webui odd architectures.
Sounds like very recent releases of copilot
have the capacity to setup worker clusters: Efe explaining workers in an AWS remote chat presentation. These consume events from, say, a queue.
This is starting to sound like the Discord bot solution which involved an autoscaling ECS cluster, but which does NOT involve copilot
: An elastic deployment of Stable Diffusion with Discord on AWS. So, perhaps for v2 of the hypnowerk service we'll hack about using copilot
to achieve much of what was done in raw CloudFormations in their solution, but with tons less boilerplate code (copilot
spits all that crap out for us).
Render cluster is one thing. Training batch jobs is another. How to do training on the same autoscaling ECS cluster? Just queue a big job, autoscaling monitor on queue detects load, ECS cluster is scaled, batch training is performed, ECS cluster scales down…
Looks like we can have multiple webUIs on each instance: https://github.com/ManyHands/hypnowerk/issues/56#issuecomment-1412631546
On Windows, how to get A1111 to start up with an API, not just a webUI:
Components:
We are going to want to have a full-on web service which implements the Stable Diffusion API as found in Automatic1111 (behind the
--api
CLI flag). This will be the interface into our SD Docker render cluster from various client programs (Photoshop, Blender, etc.)I'm guessing this will have the same internal architecture as the Discord bot codebase from AWS that I'm starting with. Here's that architecture:
So, the service gets an HTTP request at API Gateway which forwards it into Lambda for processing, which queues it up for handling by the render cluster on ECS. I'm not sure what the request looks like when it comes from their Discord bot. The A1111
--api
may be different, but we can just hack on A11111's (Python?) code that parses the HTTP request.