MountaintopLotus / braintrust

A Dockerized platform for running Stable Diffusion, on AWS (for now)
Apache License 2.0
1 stars 2 forks source link

Monolith Docker image #70

Open JohnTigue opened 1 year ago

JohnTigue commented 1 year ago

This is the Docker image which is the single user SD workstation i.e. a private instance with a GPU and:

Etymology:

Screen Shot 2023-02-04 at 8 30 05 PM
JohnTigue commented 1 year ago

Decided to stay close to AWS, like the Russians stayed close to that Nazis as Stalingrad, that way the Luftwaffe could drop bombs on them. So using Amazon Linux 2 as the base. And it just seems wise to let AWS work out the drivers installation stuff. Maybe they do something extra right so things run faster. Dunno. But here AWS's AMI to start with: Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver.

AKA: amzn2-ami-graphics-hvm-2.0.20230119.1-x86_64-gp2-e6724620-3ffb-4cc9-9690-c310d8e794ef ami-053fbfc5ab768846d.

Latest Version (2.0.20230119.1) sounds about a week old so they keep it fresh.

JohnTigue commented 1 year ago

Here's AWS docs which clicked through to the above AMI: Install NVIDIA drivers on Linux instances.

JohnTigue commented 1 year ago

That AMI doesn't seem to work with a g5? Failed twice.

Screen Shot 2023-02-04 at 8 49 07 PM
JohnTigue commented 1 year ago

So, let's just trying vanilla AL2: Amazon Linux 2 Kernel 5.10 AMI 2.0.20230119.1 x86_64 HVM gp2, on a g5.xlarge. That is what TakeSix is.

JohnTigue commented 1 year ago

Sounds like we ourselves could do some impressive tuning of SD2.1 and PyTorch for a one third to one half speed up: Accelerated Stable Diffusion with PyTorch 2

JohnTigue commented 1 year ago

The main goal is to have Docker images that get pushed to ECR and then downloaded later by ECS. Additionally we'd like to have SageMaker pull from there. And for dev/test on AWS it'd be nice to instantiate those on raw EC2 instances.

In the case of EC2 there is rigmarole that needs to be done to get the base machine running. E.g. Docker Compose v2 needs to be installed just to get stable-diffusion-webui-docker spun up. We can have a setup.sh which CloudFormation can run at instantiation. And also from the command line. Looks like such a thing could end out being about 20 lines:

https://github.com/marshmellow77/stable-diffusion-webui/blob/master/setup.sh

JohnTigue commented 1 year ago

Nice work but building FROM Ubuntu, not Amazon Linux:

JohnTigue commented 1 year ago

An argument for keeping the (large) model files in appdata, not in the core image (keep those small): GUIDE for INVOKEAI: A STABLE DIFFUSION TOOLKIT - DOCKER. Includes a Dockerfile and a start.sh.

JohnTigue commented 1 year ago

How to install an Nvidia GPU driver on an AWS EC2 instance:

If you’re wondering why the GPU is not working out of the box for your AWS EC2 p2 instance using a Deep Learning AMI, it is because these Deep Learning AMIs are only supported on specific instance types such as G3, P3, P3dn, P4d, G5, G4dn.

JohnTigue commented 1 year ago

Sure enough AWS's AMI with Tesla Drivers preinstalled can seemingly not be used for a g5.

JohnTigue commented 1 year ago

Let's try the GRID drivers instead. That seems to allow g5 instances: NVIDIA RTX Virtual Workstation - Amazon Linux 2.

JohnTigue commented 1 year ago

And, we're currently capped at 4 g5 GPUs. Just requested a quota limit increas. So, closing down some experiments in the mean time so can spin up TakeSeven.

JohnTigue commented 1 year ago

Alright, g5 spins up with GRID drivers installed. Test probe for GPU works:

[root@ip-172-31-8-49 ~]# nvidia-smi
Mon Feb  6 04:36:20 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   18C    P8    15W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
JohnTigue commented 1 year ago

So, I'll take it that means g5's want GRID drivers, not Tesla. Sounds like GRID is for data centers, so that makes sense.

JohnTigue commented 1 year ago

InvokeAI: Installing: Running the container on your GPU:

If you have an Nvidia GPU, you can enable InvokeAI to run on the GPU by running the container with an extra environment variable to enable GPU usage and have the process run much faster: GPU_FLAGS=all ./docker/run.sh This passes the --gpus all to docker and uses the GPU.

JohnTigue commented 1 year ago

InvokeAI Stable Diffusion Toolkit Docs: Installing with the Automated Installer:

Check that your system has an up-to-date Python installed. To do this, open up a command-line window ("Terminal" on Linux and Macintosh, "Command" or "Powershell" on Windows) and type python --version. If Python is installed, it will print out the version number. If it is version 3.9.1 or 3.10.x, you meet requirements. At this time we do not recommend Python 3.11

JohnTigue commented 1 year ago

And they've got version 3.7.16 installed. Lovely.

JohnTigue commented 1 year ago

How To Install Python 3.10 on Amazon Linux 2:

The reason why we are installing Python 3.10 on Amazon Linux 2 by building it from source is that the available versions in the default repositories are not up to date.

JohnTigue commented 1 year ago

Yup, this Amazon Linux is really tuned up. Much better that simply using Ubuntu like the majority of the PyTorch and SD community do. Yup. Great choice.

JohnTigue commented 1 year ago

Actually, maybe the Deep Learning AMI actually is the right way to go, simply to get fresh Python pre-installed. It supports g5:

Screen Shot 2023-02-05 at 9 24 23 PM
JohnTigue commented 1 year ago

That still gets me back to having a stale Docker install (the Docker Compose v2 hassle) but that's a lot less hassle than setting up a machine do dev (compile Python 3.10 from source).

JohnTigue commented 1 year ago

So, let's try out a TakeEight using the Deep Leaning AMI, which is seemingly AWS doing all this hassle build work for machine learning. Might have some of that magic tuning pixie dust in it.

Deep Learning AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20230201

  • ami-07b46515572e215a2 (64-bit (x86))
  • Virtualization: hvm
  • ENA enabled: true
  • Root device type: ebs
JohnTigue commented 1 year ago

And that STILL only has Python 3.7. That's lame.

[root@ip-172-31-7-117 ~]# python3
Python 3.7.16 (default, Dec 15 2022, 23:24:54) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
JohnTigue commented 1 year ago

I guess I'm back to building Python from scratch because Invoke wants 3.9 or better. Sounds like I should go for 3.10.9 because Installation says:

We recommend Version 3.10.9, which has been extensively tested with InvokeAI.

JohnTigue commented 1 year ago

How to install Docker Compose V2 as CLI plugin on Amazon Linux 2:

mkdir -p /usr/local/lib/docker/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.15.1/docker-compose-linux-x86_64 -o /usr/local/lib/docker/cli-plugins/docker-compose
chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
ls -l /usr/local/lib/docker/cli-plugins/docker-compose

That gets Docker to where the stable-diffusion-webui-docker repo needs it to be. So, then download and spin up A1111 (auto):

$ git clone https://github.com/AbdBarho/stable-diffusion-webui-docker.git
$ cd stable-diffusion-webui-docker
$ docker compose --profile download up --build
$ docker compose --profile auto up --build
JohnTigue commented 1 year ago

Think we're going to soon have a mountable volume of vetted models. See https://github.com/ManyHands/hypnowerk/issues/56#issuecomment-1420120131.

JohnTigue commented 1 year ago

stable-diffusion-webui-docker says:

You have to use one of AWS's GPU-enabled VMs and their Deep Learning OS images. These have the right divers, the toolkit and all the rest already installed and optimized. #70

JohnTigue commented 1 year ago

If we were not using one of the Amazon Deep Learning AMIs, we could build our own AMI which would have Nvidia's GPU-in-Docker tools installed: Setting up NVIDIA Container Toolkit.