MountaintopLotus / braintrust

A Dockerized platform for running Stable Diffusion, on AWS (for now)
Apache License 2.0
1 stars 2 forks source link

v0.x.x: SD on ECS, but not serverless #58

Open JohnTigue opened 1 year ago

JohnTigue commented 1 year ago

Renting these GPUs is not cheap. We are going to want an elastic, auto-scaling cluster of computers. When demand goes up, it will automatically rent more GPUs. When demand goes down, the unused machines will be decommissioned.

There was an AWS blog post about doing this with Stable Diffusion, An elastic deployment of Stable Diffusion with Discord on AWS by Steven Warren on 23 DEC 2022. This issue in NO LONGER about that. That is a serverless front and regulatory system for an ECS cluster. This issue is just an ECS cluster, no serverless front nor CloudWatch based autoscaling via SQS metrics and other high-falutin fancy bells and whistles.

JohnTigue commented 1 year ago

Lightsail is out of the running because no GPUs :(

Q: Why would I want to upgrade to EC2? Lightsail offers you an easy way to run and scale a wide set of cloud-based applications, at a bundled, predictable, and low price. Lightsail also automatically sets up your cloud environment configurations such as networking and access management.

Upgrading to EC2 allows you to run your application on a wider set of instance types, ranging from virtual machines with more CPU power, memory, and networking capabilities, to specialized or accelerated instances with FPGAs and GPUs. In addition, EC2 performs less automatic management and set-up, allowing you more control over how you configure your cloud environment, such as your VPC.

JohnTigue commented 1 year ago

Looks like serverless front with a queue going to a Docker cluster on ECS is the way forward: An elastic deployment of Stable Diffusion with Discord on AWS. We're here for the architecture. The Discord front-end is a corollary bonus. What we really want is the autoscaling, so that we add GPUs as needed, and more importantly turn them off when not needed $$$$.

JohnTigue commented 1 year ago

As mentioned yesterday (https://github.com/ManyHands/hypnowerk/issues/58#issuecomment-1386133128), Lightsail is out.

Also, "Amazon ECS powers a growing number of popular AWS services including Amazon SageMaker."

JohnTigue commented 1 year ago

When scaling up, an EC2 instance is registered with the ECS cluster. The EC2 instance type was selected because it provides GPU instances. ECS provides a repeatable solution to deploy the container across a variety of instance types. This solution currently only uses the g4dn.xlarge instance type.

JohnTigue commented 1 year ago

I actually have experience with all these basterds, except two EventBridge and Terraformer:

This blog assumes familiarity with Terraform, Docker, Discord, Amazon EC2, Amazon Elastic Block Store (Amazon EBS), Amazon Virtual Private Cloud (Amazon VPC), AWS Identity and Access Management (IAM), Amazon API Gateway, AWS Lambda, Amazon SQS, Amazon Elastic Container Registry (Amazon ECR), Amazon ECS, Amazon EventBridge, AWS Step Functions, and Amazon CloudWatch.

JohnTigue commented 1 year ago

Code (SD engine and separately its infra) is in GitHub:

JohnTigue commented 1 year ago

So, this (An elastic deployment of Stable Diffusion with Discord on AWS) is a great place to start. This thing handles requests from Discord. That's nice (and it autoscales; very nice) but we want Invoke webservers and we want a load balancer to keep pointing a user's web browser's HTTP requests to the same servers. This is completely separate way of getting to (and staying in) the autoscaling Docker cluster.

JohnTigue commented 1 year ago

Enabling Sticky Sessions with an AWS Application Load Balancer (ALB) | AWS Tutorial

JohnTigue commented 1 year ago

Looks like we can even set stickiness however long we want: seconds, minutes, days. I wonder if there is some way to sync up the load balancer and the elastic Docker ECS cluster… Screen Shot 2023-01-18 at 3 20 08 PM

JohnTigue commented 1 year ago

Sounds like ELB (Elastic Load Balancer) is our friend for sticky sessions: Back to Basics: Managing Your Web Application’s Session.

Screen Shot 2023-01-19 at 8 53 59 PM
JohnTigue commented 1 year ago

Elastic Cache for fault-tolerant storing session data, the the EC2 instances are stateless (ergo nothing gets lost when they go down. See AWS video on state managment.

JohnTigue commented 1 year ago

This is what I was hoping for: ECS is supposed to be configured with Application Load Balacers (ALB not the more general ELB category), as per ECS docs:

Your Amazon ECS service can optionally be configured to use Elastic Load Balancing to distribute traffic evenly across the tasks in your service…

We recommend that you use Application Load Balancers for your Amazon ECS services so that you can take advantage of these latest features, unless your service requires a feature that is only available with Network Load Balancers or Classic Load Balancers. For more information about Elastic Load Balancing and the differences between the load balancer types, see the Elastic Load Balancing User Guide.

JohnTigue commented 1 year ago

And we found yesterday (see up 4 messages) that ALB can have sticky session. So sounds like ECS will be able to serve as autoscaling Docker cluster of Invoke and A1111 instances, with each user getting their own server, session, and GPU.

JohnTigue commented 1 year ago

Maybe the above is not the absolutely most cost-effective solution. But that's a good sounding reasonable first pass architecture. But getting to a cluster of GPU renderers and separately a serverless API for handling Invoke client requests is way, way more custom work. This simpler plan is just deploying off-the-shelf components, not coding up a new Invoke render cluster on AWS. (Although I'm sure there are folks out there actually doing that already.)

JohnTigue commented 1 year ago

On the other hand, this Discord solution is already of a architectural type of serverless API message queuing to an auto-scaling GPU render cluster of Docker images. So, extending that to service Invoke client requests IS the right way to go… long term. So, we can build that out in a second more elegant implementation of the hypnowerk service.

JohnTigue commented 1 year ago

Heading towards this in v2, but the Rabbit MQ will be AWS SQS, and the DF server will be serverless. Essentially "web bot" is the API service that the Invoke clients reach out to in order to have some SG magic happen. That would be a cleaner, more regular architecture compared to this siamese twin Frankenstein thing I'm cobbling together.

Screen Shot 2023-01-19 at 9 31 25 PM

See reddit.

JohnTigue commented 1 year ago

It occurred to me that simply getting the as-designed "Discord bot with autoscaling ECS SD render cluster" is the easiest thing to do: code is already written. Spinning just that up gets folks a way to use SD. It's not Invoke yet, just in Discord, but it is useful as is. Prove that out, then innovate towards Invoke Render cluster.

JohnTigue commented 1 year ago

Here's the archtecture diagram for the amazon-scalable-discord-diffusion code (MIT-0 license) we're starting from. The bottom half of the diagram adds bells-and-whistles to the more basic/core top half of the diagram. But those bells-and-whistles are the autoscaling bit, functionality missing from the core ECS product. So this is AWS showing us (it is from a repo owned by aws-samples on GitHub) how to configure their base product dialed in for this scalable SD render cluster use case. And we can add on the Invoke servers, later, RSN.

Screen Shot 2023-01-20 at 9 45 07 AM

JohnTigue commented 1 year ago

This thing is very much serverless. The core GPU cluster is vanilla Docker on AWS (via ECS) but the http API machinery is API Gateway and Lambda, classic. Then all the autoscaling stuff is also (AWS specific) serverless, but that's to dial in the ECS which is already AWS-only. So, core Docker all the rest is AWS serverless. That locks us into AWS but they are managing the cluster for us. No Kubernetes learning tax, and can still migrate the GPU cluster to other Docker services…

JohnTigue commented 1 year ago

Here's someone 3 weeks ago having trouble getting a GPU on ECS. They sound like their thinking is sloppy though. Some valuable tidbits of info nonetheless. https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/6130

JohnTigue commented 1 year ago

$$$ AWS, Documentation, Amazon ECS, Developer Guide: Using Spot Instances

JohnTigue commented 1 year ago

InvokeAI Stable Diffusion Toolkit Docs: Docker

JohnTigue commented 1 year ago

Seems there's an ecs-cli tool which works with Docker Compose files: Using the Amazon ECS CLI.

JohnTigue commented 1 year ago

We want those g4 and g5 instance types for the tasty GPUs: Working with GPUs on Amazon ECS:

Amazon ECS supports workloads that use GPUs, when you create clusters with container instances that support GPUs. Amazon EC2 GPU-based container instances that use the p2, p3, g3, g4, and g5 instance types provide access to NVIDIA GPUs.

JohnTigue commented 1 year ago

Amazon EC2 G4dn Instances G4dn instances, powered by NVIDIA T4 GPUs, are the lowest cost GPU-based instances in the cloud for machine learning inference and small scale training.

JohnTigue commented 1 year ago

I wonder if in some universe, multiple ECS tasks could be on the same server and the "OS" shares out the GPU sanely. Cute maybe, but what we really want is two part: a pure web server part which message queues render requests to a pure render cluster. But that involves hacking inside InvokeAI. Simpler short term solution is a cluster of webservers with GPUs on them.

JohnTigue commented 1 year ago

Seems that webui already has an --api CLI option. So, this mimicking this on API Gateeway and Lambda is what a SD render cluster should do.

JohnTigue commented 1 year ago

And surely these things, such as the Photoshop plug-in in my previous comment, if they are already hitting localhost over HTTP for the API, then surely they can hit a remote server (please).

JohnTigue commented 1 year ago

Looks like the answer is "Yes" since the thing can already work with Colab as the api server :)

Screen Shot 2023-01-24 at 10 31 19 AM
JohnTigue commented 1 year ago

Do note that, as of the 17th, we only have credits for GPUs in us-west-2 (Oregon):

Summary of service quota(s) requested for increase: [US West (Oregon)]: EC2 Instances / nu.graphics (All G and VT instances), New Limit = 16

JohnTigue commented 1 year ago

Second instance is a g5.xlarge.

JohnTigue commented 1 year ago

AUTOMATIC1111 - webui binary v1.0.0-pre released. No installation needed, just extract and run:

"After running once, should be possible to copy the installation to another computer and launch there offline." so you could just move the A1111 folder to wherever after setup? cool.

JohnTigue commented 1 year ago

This AWS CLI tool for ECR, called copilot, seems like it has all the best practices baked into it: Getting started with Amazon ECS using AWS Copilot.

JohnTigue commented 1 year ago

See Switch pulls from Docker to ECR devtools#84 for previous work with ECR.

JohnTigue commented 1 year ago

Andy was telling me that the first instance has died. Can confirm.

JohnTigue commented 1 year ago

Yup. And that's because the instance is stopped. Derp. I must have turned it off to save money?

Screen Shot 2023-01-25 at 10 44 11 AM
JohnTigue commented 1 year ago

Restart was trivial. Of course, this hacked way means new IP address, but it also blew away the hole for port 7860. Didn't expect that.

JohnTigue commented 1 year ago

This is exactly where I was last week. https://github.com/ManyHands/hypnowerk/issues/47#issuecomment-1384627677. But still why do I need to do this hole punching again?

JohnTigue commented 1 year ago

Looks like when launching from the AwsWebUI console, it spits out a new security group each time, rather than reusing the one from previous launch. Whatevs.

Screen Shot 2023-01-25 at 10 57 14 AM
JohnTigue commented 1 year ago

sg-0f53445050a4d90fd - launch-wizard-4 is the group with the hole for port 7860.

Screen Shot 2023-01-25 at 11 01 02 AM
JohnTigue commented 1 year ago

Add lauch-wizard-4 SG. Then connect.

$ cd stable-diffusion-webui-docker
$ docker compose --profile download up --build

But nothing:

webui-docker-download-1  | https://github.com/cszn/SCUNet/blob/main/LICENSE
webui-docker-download-1 exited with code 0
ubuntu@ip-172-31-1-3:~/stable-diffusion-webui-docker$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
ubuntu@ip-172-31-1-3:~/stable-diffusion-webui-docker$ docker container ls
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
ubuntu@ip-172-31-1-3:~/stable-diffusion-webui-docker$ 

Why? And why exit code 0 which usually means good?

JohnTigue commented 1 year ago

OK, I'm just a braindead moron. Things were going right, I just wasn't doing the final command:

docker compose --profile invoke up --build

The reason it exited 0 was because as I requested it downloaded all the fines (docker compose --profile download up --build). Seems the --profile is how we specify command to execute. (Yeah, that was obvious, said no one).

JohnTigue commented 1 year ago

Oh, boy, I'm really clowning this. So, the reason that the TakeOne instance was stopped because that's the g4ad.xlarge, which is NOT an NVIDIA instance, rather its the ADM flavor. See Amazon EC2 G4 Instances:

G4dn instances feature NVIDIA T4 GPUs and custom Intel Cascade Lake CPUs, and are optimized for machine learning inference and small scale training. These instances also bring high performance to graphics-intensive applications including remote workstations, game streaming, and graphics rendering. These instances are also ideal for customers who prefer to use NVIDIA software such as RTX Virtual Workstation and libraries such as CUDA, CuDNN, and NVENC.

G4ad instances feature the latest AMD Radeon Pro V520 GPUs and 2nd generation AMD EPYC processors. These instances provide the best price performance in the cloud for graphics applications including remote workstations, game streaming, and graphics rendering. Compared to comparable instances they offer up to 45% better price performance for graphics-intensive applications.

JohnTigue commented 1 year ago

A way to test for the presence of an NVIDIA GPU is via the CLI: nvidia-smi.

JohnTigue commented 1 year ago

Ok, back on track: there is something wrong with TakeTwo (the g5.xlarge with the NVIDIA GPU).

Connect to the instance, and indeed docker is running invoke there:

$ docker ps
CONTAINER ID   IMAGE          COMMAND                  CREATED      STATUS      PORTS                                       NAMES
7efcc39c9131   sd-invoke:17   "/docker/entrypoint.…"   8 days ago   Up 8 days   0.0.0.0:7860->7860/tcp, :::7860->7860/tcp   webui-docker-invoke-1
JohnTigue commented 1 year ago

Looks like there's still FS space left, so that's not it:

ubuntu@ip-172-31-4-212:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        78G   57G   22G  73% /
devtmpfs        7.8G     0  7.8G   0% /dev
…
JohnTigue commented 1 year ago
$ docker container ls
CONTAINER ID   IMAGE          COMMAND                  CREATED      STATUS      PORTS                                       NAMES
7efcc39c9131   sd-invoke:17   "/docker/entrypoint.…"   8 days ago   Up 8 days   0.0.0.0:7860->7860/tcp, :::7860->7860/tcp   webui-docker-invoke-1
$ docker restart 7efcc39c9131
7efcc39c9131
JohnTigue commented 1 year ago

And we're back:

http://52.42.196.43:7860/

JohnTigue commented 1 year ago

No idea what went wrong.

JohnTigue commented 1 year ago

I do remember someone talking about keeping FS persistence out of the Docker cluster. S3 is the obvious solution for AWS with a serverless mindset. This way if a server gets whacked out we can just clobber it and restart fresh without loss of work (images out of SD).

(Of course, would be better to know "why" the instance went bad…)