v0.x.x: SD on ECS, but not serverless

JohnTigue commented 1 year ago

Renting these GPUs is not cheap. We are going to want an elastic, auto-scaling cluster of computers. When demand goes up, it will automatically rent more GPUs. When demand goes down, the unused machines will be decommissioned.

There was an AWS blog post about doing this with Stable Diffusion, An elastic deployment of Stable Diffusion with Discord on AWS by Steven Warren on 23 DEC 2022. This issue in NO LONGER about that. That is a serverless front and regulatory system for an ECS cluster. This issue is just an ECS cluster, no serverless front nor CloudWatch based autoscaling via SQS metrics and other high-falutin fancy bells and whistles.

JohnTigue commented 1 year ago

Mounting Amazon S3 to an Amazon EC2 instance using a private connection to S3 File Gateway:

In this blog we demonstrate how to mount Amazon S3 as an NFS volume to an EC2 instance using private connections to AWS Storage Gateway and S3 using VPC endpoints. … The key benefit of this solution is that it provides a cost-effective alternative of using object storage for applications dealing with large files, as compared to expensive file or block storage. At the same time it provides more performant, scalable and highly available storage for these applications.

JohnTigue commented 1 year ago

Mount S3 bucket as filesystem on AWS ECS container

JohnTigue commented 1 year ago

2020: S3 Bucket as data storage persistent in container ECS

JohnTigue commented 1 year ago

Actually, directly write to S3 IS doable but for ECS the folks at AWS seem to be steering us towards EFS instead. Sounds like a whole bunch of rigmarole around Docker container storage and volume storage can be obviated by using EFS. Sounds like using EFS is more a Docker thing than a rando AWS service call to S3. Would probably more portable. They explained it all in a three part blog series in 2020: Developers guide to using Amazon EFS with Amazon ECS and AWS Fargate – Part 1.

JohnTigue commented 1 year ago

The load balancer to be used with ECS is ALB (Application Load Balancer; there are others in AWS).

This video is part two of a good walkthrough of the webUI AWS Console for manually building a cluster fronted by an ALB: Amazon Elastic Container Service (ECS) with a Load Balancer | AWS Tutorial with New ECS Experience.

JohnTigue commented 1 year ago

What she builds is pretty much what we need. The only difference is what gets pulled from ECR. She's just demoing with nginx from public ECR. We'll be pulling the Dockerized SD image, same as the ONE currently running.

JohnTigue commented 1 year ago

Longterm, the codebase from AWS, amazon-scalable-infra-discord-diffusion, is much better work than her intro video. Using the webui AWS console as a didactic tools is good for showing the humans, but the AWS codebase is CloudFormation based. The only problem with it is that serving as a SD render cluster backend for a Discord bot is different than a SD render cluster with its own webUI (read: InvokeAI and A1111).

So, perhaps do it her way at first, and then fold that into the AWS CloudFormation codebase, which has a LOT of boilerplate code in it.

We are always going to want to have a webui session holding cluster for the Invoke users. And that may eventually talk to a separate render cluster through a message queue, like the AWS codebase amazon-scalable-infra-discord-diffusion is architected.

But at first there will be completely separate clusters: one for Invoke workstations, and separately one for Discord requests (and we don't really need the Discord one at first). Eventually the Invoke session cluster would ideally queue up render requests to a cluster like amazon-scalable-infra-discord-diffusion does. In that case both Discord and Invoke would be using the same render cluster. And other clients could as use it (Photoshop, Blender, etc.) via the --api. But that's not version one.

JohnTigue commented 1 year ago

OK, this is definitely looking like this will have multiple versions, and that these are simply the early versions of the hypnowerk service (#64).

JohnTigue commented 1 year ago

The first version, let's call it v1.x.x is simply a vanilla ECS deploy of SD Dockerized. A later version (v2.x.x?) will strap on all that cool serverless goodness that AWS demo'd in their SD Discord bot project.

For v1.x.x, we can just use stock ECS autoscaling. ECS autoscaling is a two part thing:

Docker level ECS task autoscaling
VM level Amazon ECS cluster Auto Scaling

Note that we do not need to do the autoscaling based on SQS usage metrics as AWS does in their Discord bot project. That's a whole nuther level of complexity imposed by the serverless Discord bot rigmarole. Oh, sure, we want that but we cannot really benefit from it until we have Invoke using the same message queued service. So, stock ECS autoscaling will do for v1.x.x.

JohnTigue commented 1 year ago

So, that's what this issue (#58) is: it's the non-serverless early version of the hypnowerk service (#64). Yes, this v1.x.x is Dockerized and designed for deploy on ECS, but it is not serverless. The stock SD webUIs are not built for serverless. So, easier v1.x.x release this way, but the real power, cost, and management benefits will only come when we evolve towards the serverless (and CloudFormationed) code as demo'd in AWS's SD Discord bot project (An elastic deployment of Stable Diffusion with Discord on AWS).

JohnTigue commented 1 year ago

That settles it and so might as well change this issue's name to "v1.x.x: Stable Diffusion Dockerized on ECS but NOT serverless"

JohnTigue commented 1 year ago

Continuing with the odd architecture's implications, we want EC2 instances (which from ECS perspective are called Container Instances) with one ECS task (an Invoke session) per instance. The task definition will call out for EC2 (not FARGATE) and a minimal instance type (g4 and g5 to start).

JohnTigue commented 1 year ago

With sticky session set up on the Application Load Balancer (ALB) we all hit the same IP address and port (the ALB's) and then internally get routed stickily to our GPUd task container instance.

JohnTigue commented 1 year ago

So, want to deploy from ECR the repo stable-diffusion-webui-docker. And we want to deploy that as an ECS task to EC2 instances with the following config:

Instance type: g5.xlarge ("the smallest one with a beefy Nvidia GPU") = ~$850/mo
File system: 70GB seems
AMI: ami-06b81ce928c07a34f, Deep Learning AMI GPU PyTorch 1.13.1 (Ubuntu 20.04) 20230103

JohnTigue commented 1 year ago

Might as well do the build and push on an EC2 instance. No need to heat up my wifi with 20 GB up and down (slow!): Amazon ECR: Pushing a Docker image.

JohnTigue commented 1 year ago

Actually, sounds like its better to let copilot handle the ECR. It doesn't know how to work with existing ECR repos; it's not expressed in the manifest file (yet). So let it create its own ECR repo. See Copilot (#68).

MountaintopLotus / braintrust

v0.x.x: SD on ECS, but not serverless #58