Open JohnTigue opened 1 year ago
Lightsail is out of the running because no GPUs :(
Q: Why would I want to upgrade to EC2? Lightsail offers you an easy way to run and scale a wide set of cloud-based applications, at a bundled, predictable, and low price. Lightsail also automatically sets up your cloud environment configurations such as networking and access management.
Upgrading to EC2 allows you to run your application on a wider set of instance types, ranging from virtual machines with more CPU power, memory, and networking capabilities, to specialized or accelerated instances with FPGAs and GPUs. In addition, EC2 performs less automatic management and set-up, allowing you more control over how you configure your cloud environment, such as your VPC.
Looks like serverless front with a queue going to a Docker cluster on ECS is the way forward: An elastic deployment of Stable Diffusion with Discord on AWS. We're here for the architecture. The Discord front-end is a corollary bonus. What we really want is the autoscaling, so that we add GPUs as needed, and more importantly turn them off when not needed $$$$.
As mentioned yesterday (https://github.com/ManyHands/hypnowerk/issues/58#issuecomment-1386133128), Lightsail is out.
Also, "Amazon ECS powers a growing number of popular AWS services including Amazon SageMaker."
When scaling up, an EC2 instance is registered with the ECS cluster. The EC2 instance type was selected because it provides GPU instances. ECS provides a repeatable solution to deploy the container across a variety of instance types. This solution currently only uses the g4dn.xlarge instance type.
I actually have experience with all these basterds, except two EventBridge and Terraformer:
This blog assumes familiarity with Terraform, Docker, Discord, Amazon EC2, Amazon Elastic Block Store (Amazon EBS), Amazon Virtual Private Cloud (Amazon VPC), AWS Identity and Access Management (IAM), Amazon API Gateway, AWS Lambda, Amazon SQS, Amazon Elastic Container Registry (Amazon ECR), Amazon ECS, Amazon EventBridge, AWS Step Functions, and Amazon CloudWatch.
Code (SD engine and separately its infra) is in GitHub:
So, this (An elastic deployment of Stable Diffusion with Discord on AWS) is a great place to start. This thing handles requests from Discord. That's nice (and it autoscales; very nice) but we want Invoke webservers and we want a load balancer to keep pointing a user's web browser's HTTP requests to the same servers. This is completely separate way of getting to (and staying in) the autoscaling Docker cluster.
Looks like we can even set stickiness however long we want: seconds, minutes, days. I wonder if there is some way to sync up the load balancer and the elastic Docker ECS cluster…
Sounds like ELB (Elastic Load Balancer) is our friend for sticky sessions: Back to Basics: Managing Your Web Application’s Session.
Elastic Cache for fault-tolerant storing session data, the the EC2 instances are stateless (ergo nothing gets lost when they go down. See AWS video on state managment.
This is what I was hoping for: ECS is supposed to be configured with Application Load Balacers (ALB not the more general ELB category), as per ECS docs:
Your Amazon ECS service can optionally be configured to use Elastic Load Balancing to distribute traffic evenly across the tasks in your service…
We recommend that you use Application Load Balancers for your Amazon ECS services so that you can take advantage of these latest features, unless your service requires a feature that is only available with Network Load Balancers or Classic Load Balancers. For more information about Elastic Load Balancing and the differences between the load balancer types, see the Elastic Load Balancing User Guide.
And we found yesterday (see up 4 messages) that ALB can have sticky session. So sounds like ECS will be able to serve as autoscaling Docker cluster of Invoke and A1111 instances, with each user getting their own server, session, and GPU.
Maybe the above is not the absolutely most cost-effective solution. But that's a good sounding reasonable first pass architecture. But getting to a cluster of GPU renderers and separately a serverless API for handling Invoke client requests is way, way more custom work. This simpler plan is just deploying off-the-shelf components, not coding up a new Invoke render cluster on AWS. (Although I'm sure there are folks out there actually doing that already.)
On the other hand, this Discord solution is already of a architectural type of serverless API message queuing to an auto-scaling GPU render cluster of Docker images. So, extending that to service Invoke client requests IS the right way to go… long term. So, we can build that out in a second more elegant implementation of the hypnowerk service.
Heading towards this in v2, but the Rabbit MQ will be AWS SQS, and the DF server will be serverless. Essentially "web bot" is the API service that the Invoke clients reach out to in order to have some SG magic happen. That would be a cleaner, more regular architecture compared to this siamese twin Frankenstein thing I'm cobbling together.
See reddit.
It occurred to me that simply getting the as-designed "Discord bot with autoscaling ECS SD render cluster" is the easiest thing to do: code is already written. Spinning just that up gets folks a way to use SD. It's not Invoke yet, just in Discord, but it is useful as is. Prove that out, then innovate towards Invoke Render cluster.
Here's the archtecture diagram for the amazon-scalable-discord-diffusion code (MIT-0 license) we're starting from. The bottom half of the diagram adds bells-and-whistles to the more basic/core top half of the diagram. But those bells-and-whistles are the autoscaling bit, functionality missing from the core ECS product. So this is AWS showing us (it is from a repo owned by aws-samples on GitHub) how to configure their base product dialed in for this scalable SD render cluster use case. And we can add on the Invoke servers, later, RSN.
This thing is very much serverless. The core GPU cluster is vanilla Docker on AWS (via ECS) but the http API machinery is API Gateway and Lambda, classic. Then all the autoscaling stuff is also (AWS specific) serverless, but that's to dial in the ECS which is already AWS-only. So, core Docker all the rest is AWS serverless. That locks us into AWS but they are managing the cluster for us. No Kubernetes learning tax, and can still migrate the GPU cluster to other Docker services…
Here's someone 3 weeks ago having trouble getting a GPU on ECS. They sound like their thinking is sloppy though. Some valuable tidbits of info nonetheless. https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/6130
Seems there's an ecs-cli
tool which works with Docker Compose files: Using the Amazon ECS CLI.
We want those g4 and g5 instance types for the tasty GPUs: Working with GPUs on Amazon ECS:
Amazon ECS supports workloads that use GPUs, when you create clusters with container instances that support GPUs. Amazon EC2 GPU-based container instances that use the p2, p3, g3, g4, and g5 instance types provide access to NVIDIA GPUs.
Amazon EC2 G4dn Instances G4dn instances, powered by NVIDIA T4 GPUs, are the lowest cost GPU-based instances in the cloud for machine learning inference and small scale training.
I wonder if in some universe, multiple ECS tasks could be on the same server and the "OS" shares out the GPU sanely. Cute maybe, but what we really want is two part: a pure web server part which message queues render requests to a pure render cluster. But that involves hacking inside InvokeAI. Simpler short term solution is a cluster of webservers with GPUs on them.
Seems that webui already has an --api
CLI option. So, this mimicking this on API Gateeway and Lambda is what a SD render cluster should do.
And surely these things, such as the Photoshop plug-in in my previous comment, if they are already hitting localhost over HTTP for the API, then surely they can hit a remote server (please).
Looks like the answer is "Yes" since the thing can already work with Colab as the api server :)
Do note that, as of the 17th, we only have credits for GPUs in us-west-2 (Oregon):
Summary of service quota(s) requested for increase: [US West (Oregon)]: EC2 Instances / nu.graphics (All G and VT instances), New Limit = 16
Second instance is a g5.xlarge.
AUTOMATIC1111 - webui binary v1.0.0-pre released. No installation needed, just extract and run:
"After running once, should be possible to copy the installation to another computer and launch there offline." so you could just move the A1111 folder to wherever after setup? cool.
This AWS CLI tool for ECR, called copilot
, seems like it has all the best practices baked into it: Getting started with Amazon ECS using AWS Copilot.
See Switch pulls from Docker to ECR devtools#84 for previous work with ECR.
Andy was telling me that the first instance has died. Can confirm.
Yup. And that's because the instance is stopped. Derp. I must have turned it off to save money?
Restart was trivial. Of course, this hacked way means new IP address, but it also blew away the hole for port 7860. Didn't expect that.
This is exactly where I was last week. https://github.com/ManyHands/hypnowerk/issues/47#issuecomment-1384627677. But still why do I need to do this hole punching again?
Looks like when launching from the AwsWebUI console, it spits out a new security group each time, rather than reusing the one from previous launch. Whatevs.
sg-0f53445050a4d90fd - launch-wizard-4 is the group with the hole for port 7860.
Add lauch-wizard-4 SG. Then connect.
$ cd stable-diffusion-webui-docker
$ docker compose --profile download up --build
But nothing:
webui-docker-download-1 | https://github.com/cszn/SCUNet/blob/main/LICENSE
webui-docker-download-1 exited with code 0
ubuntu@ip-172-31-1-3:~/stable-diffusion-webui-docker$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ubuntu@ip-172-31-1-3:~/stable-diffusion-webui-docker$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ubuntu@ip-172-31-1-3:~/stable-diffusion-webui-docker$
Why? And why exit code 0 which usually means good?
OK, I'm just a braindead moron. Things were going right, I just wasn't doing the final command:
docker compose --profile invoke up --build
The reason it exited 0 was because as I requested it downloaded all the fines (docker compose --profile download up --build
). Seems the --profile
is how we specify command to execute. (Yeah, that was obvious, said no one).
Oh, boy, I'm really clowning this. So, the reason that the TakeOne instance was stopped because that's the g4ad.xlarge, which is NOT an NVIDIA instance, rather its the ADM flavor. See Amazon EC2 G4 Instances:
G4dn instances feature NVIDIA T4 GPUs and custom Intel Cascade Lake CPUs, and are optimized for machine learning inference and small scale training. These instances also bring high performance to graphics-intensive applications including remote workstations, game streaming, and graphics rendering. These instances are also ideal for customers who prefer to use NVIDIA software such as RTX Virtual Workstation and libraries such as CUDA, CuDNN, and NVENC.
G4ad instances feature the latest AMD Radeon Pro V520 GPUs and 2nd generation AMD EPYC processors. These instances provide the best price performance in the cloud for graphics applications including remote workstations, game streaming, and graphics rendering. Compared to comparable instances they offer up to 45% better price performance for graphics-intensive applications.
A way to test for the presence of an NVIDIA GPU is via the CLI: nvidia-smi
.
Ok, back on track: there is something wrong with TakeTwo (the g5.xlarge with the NVIDIA GPU).
Connect to the instance, and indeed docker is running invoke there:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7efcc39c9131 sd-invoke:17 "/docker/entrypoint.…" 8 days ago Up 8 days 0.0.0.0:7860->7860/tcp, :::7860->7860/tcp webui-docker-invoke-1
Looks like there's still FS space left, so that's not it:
ubuntu@ip-172-31-4-212:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 78G 57G 22G 73% /
devtmpfs 7.8G 0 7.8G 0% /dev
…
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7efcc39c9131 sd-invoke:17 "/docker/entrypoint.…" 8 days ago Up 8 days 0.0.0.0:7860->7860/tcp, :::7860->7860/tcp webui-docker-invoke-1
$ docker restart 7efcc39c9131
7efcc39c9131
And we're back:
No idea what went wrong.
I do remember someone talking about keeping FS persistence out of the Docker cluster. S3 is the obvious solution for AWS with a serverless mindset. This way if a server gets whacked out we can just clobber it and restart fresh without loss of work (images out of SD).
(Of course, would be better to know "why" the instance went bad…)
Renting these GPUs is not cheap. We are going to want an elastic, auto-scaling cluster of computers. When demand goes up, it will automatically rent more GPUs. When demand goes down, the unused machines will be decommissioned.
There was an AWS blog post about doing this with Stable Diffusion, An elastic deployment of Stable Diffusion with Discord on AWS by Steven Warren on 23 DEC 2022. This issue in NO LONGER about that. That is a serverless front and regulatory system for an ECS cluster. This issue is just an ECS cluster, no serverless front nor CloudWatch based autoscaling via SQS metrics and other high-falutin fancy bells and whistles.