Open JohnTigue opened 1 year ago
Mounting Amazon S3 to an Amazon EC2 instance using a private connection to S3 File Gateway:
In this blog we demonstrate how to mount Amazon S3 as an NFS volume to an EC2 instance using private connections to AWS Storage Gateway and S3 using VPC endpoints. … The key benefit of this solution is that it provides a cost-effective alternative of using object storage for applications dealing with large files, as compared to expensive file or block storage. At the same time it provides more performant, scalable and highly available storage for these applications.
Actually, directly write to S3 IS doable but for ECS the folks at AWS seem to be steering us towards EFS instead. Sounds like a whole bunch of rigmarole around Docker container storage and volume storage can be obviated by using EFS. Sounds like using EFS is more a Docker thing than a rando AWS service call to S3. Would probably more portable. They explained it all in a three part blog series in 2020: Developers guide to using Amazon EFS with Amazon ECS and AWS Fargate – Part 1.
The load balancer to be used with ECS is ALB (Application Load Balancer; there are others in AWS).
This video is part two of a good walkthrough of the webUI AWS Console for manually building a cluster fronted by an ALB: Amazon Elastic Container Service (ECS) with a Load Balancer | AWS Tutorial with New ECS Experience.
What she builds is pretty much what we need. The only difference is what gets pulled from ECR. She's just demoing with nginx from public ECR. We'll be pulling the Dockerized SD image, same as the ONE currently running.
Longterm, the codebase from AWS, amazon-scalable-infra-discord-diffusion, is much better work than her intro video. Using the webui AWS console as a didactic tools is good for showing the humans, but the AWS codebase is CloudFormation based. The only problem with it is that serving as a SD render cluster backend for a Discord bot is different than a SD render cluster with its own webUI (read: InvokeAI and A1111).
So, perhaps do it her way at first, and then fold that into the AWS CloudFormation codebase, which has a LOT of boilerplate code in it.
We are always going to want to have a webui session holding cluster for the Invoke users. And that may eventually talk to a separate render cluster through a message queue, like the AWS codebase amazon-scalable-infra-discord-diffusion is architected.
But at first there will be completely separate clusters: one for Invoke workstations, and separately one for Discord requests (and we don't really need the Discord one at first). Eventually the Invoke session cluster would ideally queue up render requests to a cluster like amazon-scalable-infra-discord-diffusion does. In that case both Discord and Invoke would be using the same render cluster. And other clients could as use it (Photoshop, Blender, etc.) via the --api
. But that's not version one.
OK, this is definitely looking like this will have multiple versions, and that these are simply the early versions of the hypnowerk service (#64).
The first version, let's call it v1.x.x is simply a vanilla ECS deploy of SD Dockerized. A later version (v2.x.x?) will strap on all that cool serverless goodness that AWS demo'd in their SD Discord bot project.
For v1.x.x, we can just use stock ECS autoscaling. ECS autoscaling is a two part thing:
Note that we do not need to do the autoscaling based on SQS usage metrics as AWS does in their Discord bot project. That's a whole nuther level of complexity imposed by the serverless Discord bot rigmarole. Oh, sure, we want that but we cannot really benefit from it until we have Invoke using the same message queued service. So, stock ECS autoscaling will do for v1.x.x.
So, that's what this issue (#58) is: it's the non-serverless early version of the hypnowerk service (#64). Yes, this v1.x.x is Dockerized and designed for deploy on ECS, but it is not serverless. The stock SD webUIs are not built for serverless. So, easier v1.x.x release this way, but the real power, cost, and management benefits will only come when we evolve towards the serverless (and CloudFormationed) code as demo'd in AWS's SD Discord bot project (An elastic deployment of Stable Diffusion with Discord on AWS).
That settles it and so might as well change this issue's name to "v1.x.x: Stable Diffusion Dockerized on ECS but NOT serverless"
Continuing with the odd architecture's implications, we want EC2 instances (which from ECS perspective are called Container Instances) with one ECS task (an Invoke session) per instance. The task definition will call out for EC2 (not FARGATE) and a minimal instance type (g4 and g5 to start).
With sticky session set up on the Application Load Balancer (ALB) we all hit the same IP address and port (the ALB's) and then internally get routed stickily to our GPUd task container instance.
So, want to deploy from ECR the repo stable-diffusion-webui-docker. And we want to deploy that as an ECS task to EC2 instances with the following config:
Might as well do the build and push on an EC2 instance. No need to heat up my wifi with 20 GB up and down (slow!): Amazon ECR: Pushing a Docker image.
Actually, sounds like its better to let copilot handle the ECR. It doesn't know how to work with existing ECR repos; it's not expressed in the manifest file (yet). So let it create its own ECR repo. See Copilot (#68).
Renting these GPUs is not cheap. We are going to want an elastic, auto-scaling cluster of computers. When demand goes up, it will automatically rent more GPUs. When demand goes down, the unused machines will be decommissioned.
There was an AWS blog post about doing this with Stable Diffusion, An elastic deployment of Stable Diffusion with Discord on AWS by Steven Warren on 23 DEC 2022. This issue in NO LONGER about that. That is a serverless front and regulatory system for an ECS cluster. This issue is just an ECS cluster, no serverless front nor CloudWatch based autoscaling via SQS metrics and other high-falutin fancy bells and whistles.