MountaintopLotus / braintrust

A Dockerized platform for running Stable Diffusion, on AWS (for now)
Apache License 2.0
1 stars 2 forks source link

BrainTrust service #64

Open JohnTigue opened 1 year ago

JohnTigue commented 1 year ago

Components:

We are going to want to have a full-on web service which implements the Stable Diffusion API as found in Automatic1111 (behind the --api CLI flag). This will be the interface into our SD Docker render cluster from various client programs (Photoshop, Blender, etc.)

I'm guessing this will have the same internal architecture as the Discord bot codebase from AWS that I'm starting with. Here's that architecture:

fig1-stable-diffusion

So, the service gets an HTTP request at API Gateway which forwards it into Lambda for processing, which queues it up for handling by the render cluster on ECS. I'm not sure what the request looks like when it comes from their Discord bot. The A1111 --api may be different, but we can just hack on A11111's (Python?) code that parses the HTTP request.

JohnTigue commented 1 year ago

Since the backend of this hyponwerk service is ECS Docker cluster – i.e. Docker images deployed from ECR – then fer sure the API request handler (Lambda) should be Docker images, not the old school ZIP file format from pre-2021 reInvent. This way the whole thing is Dockerized – and will definitely NOT work anywhere except AWS. But there's still tons of Docker goodness even within that AWS-only world.

JohnTigue commented 1 year ago

A tidbit about inpainting API problems

JohnTigue commented 1 year ago

A1111 wiki: API seems to be the doc for their API, which links to a Swagger doc webUI.

JohnTigue commented 1 year ago

Looks like this Wiley E. Coyote Supergenius ("I have a beautiful vision for DFserver") ran out of steam after a month or so, but interesting to see other ideas: https://github.com/huo-ju/dfserver

JohnTigue commented 1 year ago

Here's someone implementing the API service using FastAPI library. Dockerized too, but very much not ECS: How to Run Stable Diffusion in Docker with a Simple Web API and GPU Support

JohnTigue commented 1 year ago

"Stability Gnerator API," you say?

stability-sdk on GitHub:

Client implementations that interact with the Stability Generator API … Python client client.py is both a command line client and an API class that wraps the gRPC based API. To try the client:

JohnTigue commented 1 year ago

The host can be set, so we could point it at AWS. I wonder if this works with A1111 in --api mode.

https://github.com/Stability-AI/stability-sdk/blob/d8f140f8828022d0ad5635acbd0fecd6f6fc317a/src/stability_sdk/client.py#L366

    STABILITY_HOST = os.getenv("STABILITY_HOST", "grpc.stability.ai:443")
    STABILITY_KEY = os.getenv("STABILITY_KEY", "")

    if not STABILITY_HOST:
        logger.warning("STABILITY_HOST environment variable needs to be set.")
        sys.exit(1)
JohnTigue commented 1 year ago

Hypnowerk is simply a render cluster that has been architected to be the engine for all manner of machinery on AWS. Definitely want to get to where there is an Invoke session service that calls into the render cluster, but that is not a Version One thing.

All the complexity of the render cluster should be hidden behind a service API. Scaling without getting into spaghetti code is a concern.

JohnTigue commented 1 year ago

Another reason to have all clients (Invoke, Discord, Blender, Photoshop, et cetara) using the same message queue interface to the render cluster is because SQS provides the usage flow metrics which is used to titrate the autoscaling:

To properly scale the system, a custom Amazon CloudWatch metric is created. This custom CloudWatch metric calculates the number of Amazon Elastic Container Service (Amazon ECS) tasks required to adequately handle the amount of Amazon Simple Queue Service (Amazon SQS) messages. You should have a high-resolution CloudWatch metric to scale up quickly. For this use case, a high-resolution CloudWatch metric of every 10 seconds was implemented.

JohnTigue commented 1 year ago

SD webUI architectures are weird. I like it but it's a bit odd because it assumes one user per web server instance (with a GPU in the instance). So having a cluster of them is more like remote desk top Windows machine than a traditional webapp. But a GPU will only be busy while that one user is rendering. Very cost inefficient. This is yet another reason to get to where there are Invoke session services (on Lambda, I expect) that queue out render requests to the render cluster. This way GPU utilization can be optimized, much less wastage, autoscaling, et cetera.

JohnTigue commented 1 year ago

One problem with SD on Docker on macOS is that you cannot use the GPUs in the Macbook. But once you can, the SD community will move to Docker. And in Dockerland you will have Docker Compose being used to set up two containers: web-front and render-engine, on the same laptop or GPU rig. When that happens, the same architecture can be deployed to a render cluster in a cloud provider.

Point is the odd architecture of early SD webui is stuck in a pre-Docker evolution, or we are on the cusp of a split. That is what the AWS SD Docker bot project demonstrated as do the cost concerns I'm running into with low GPU utilization with these early SD webui odd architectures.

JohnTigue commented 1 year ago

Sounds like very recent releases of copilot have the capacity to setup worker clusters: Efe explaining workers in an AWS remote chat presentation. These consume events from, say, a queue.

This is starting to sound like the Discord bot solution which involved an autoscaling ECS cluster, but which does NOT involve copilot: An elastic deployment of Stable Diffusion with Discord on AWS. So, perhaps for v2 of the hypnowerk service we'll hack about using copilot to achieve much of what was done in raw CloudFormations in their solution, but with tons less boilerplate code (copilot spits all that crap out for us).

JohnTigue commented 1 year ago

Render cluster is one thing. Training batch jobs is another. How to do training on the same autoscaling ECS cluster? Just queue a big job, autoscaling monitor on queue detects load, ECS cluster is scaled, batch training is performed, ECS cluster scales down…

JohnTigue commented 1 year ago

Looks like we can have multiple webUIs on each instance: https://github.com/ManyHands/hypnowerk/issues/56#issuecomment-1412631546

JohnTigue commented 1 year ago

On Windows, how to get A1111 to start up with an API, not just a webUI:

Screen Shot 2023-02-02 at 12 12 24 PM