MountaintopLotus / braintrust

A Dockerized platform for running Stable Diffusion, on AWS (for now)
Apache License 2.0
1 stars 2 forks source link

Jupyter #74

Open JohnTigue opened 1 year ago

JohnTigue commented 1 year ago

We've been collection SD centric Jupyter Notebooks(#50). We should have machinery to run them. JupyterLab and SageMaker(#27) are the obvious choices. We already have evidence that Invoke CLI can work with Juypter: InvokeAI repo, Stable_Diffusion_AI_Notebook.ipynb

JohnTigue commented 1 year ago

Spinning up JupterLab on EC2 should be easy enough: How to Connect with Jupyter Server Running on AWS EC2. The port forwarding via SSH is clever and cute. But we'll just punch a hole through the firewall ("Security Group").

JohnTigue commented 1 year ago

Got that running at http://54.203.116.198:8888/lab.

Was able to get to a terminal and curl models. Can also control docker compose up.

JohnTigue commented 1 year ago

We can script SD via Jupyter notebooks. For example we can come up with a test the runs say 20 prompts through a SD pipeline and show the results in an image grid. A good way for us to see what changes to a model do on a consistent test data set. "Reproducible research" is what the cool kids in science class call it.

JohnTigue commented 1 year ago

I'll have to check if there are new tools. It's been almost two years since I was messing with running notebooks. This was king for a while: nbRunner.

JohnTigue commented 1 year ago

I already have JupyterLab running but I want to get it cohabitating with InvokeAI inside ECS. This jupyter-ecs-service looks like nice work. I haven't used CDK yet but this might just be the use case that gets me to pull that trigger: jupyter-ecs-service.

JohnTigue commented 1 year ago

This article, Deploy and run a Jupyter Lab server using Docker on AWS, is interesting in that JupyterLab is run on its own server and a private network is used to talk to other instances in the cluster. I had been trying to put everything on one instance (so they shared the same FS which contains the models and gens) but that is also why it was hard to isolate Auto1111 as the troublemaker. Perhaps I should keep separating all the players. This would require upping the Storage (#66) machinery… but it feels right.

JohnTigue commented 1 year ago

Actually, separating different services to different containers is not only architectural cleaner (and in so doing prepares for the potential of splitting into two clusters connected by a message queue) but it would also make it much easier to develop locally (if there were a mock for the GPU machine; actually SD can run on CPU-only modes, which is slow but that would be sufficient for quick dev/test).

JohnTigue commented 1 year ago

AWS is the only deploy target we are currently concerned with. This allows less work because we avoid having to build out "generic, heavy lifting" code. This plan is not changing at this time.

But IF BrainTrust were to become an open source project, folks would want to deploy it on, say, Kubernetes. In the article, Deploy and run a Jupyter Lab server using Docker on AWS, including things like nginx for load balancing is introduces. We are currently using AWS' ELB for that.

JohnTigue commented 1 year ago

For the Workstation (kinda a single user instance but in the cloud) it is sufficient to just launch a single JupyterLab instance. But if this does actually get to a v2.x.x architecture of a GPU render cluster accessed via a message queue, or simply the team needs more individual user isolation, JupyterHub would be more industrial. That would be the cloud provider independent implementation. For some of that benefit but less work, SageMaker might be a good call.

JohnTigue commented 1 year ago

(There is mounting evidence that the v2.x.x architecture of a GPU render cluster accessed via a message queue is really the way to go. Still trying to avoid doing that work up front. As long as it is just an internal tool… v1.x.x (Workstation) is, grumble-grumble, acceptable.)

JohnTigue commented 1 year ago

Shoot, another use case of the render cluster would be to service a JupyterHub/SageMaker Studio deploy. And, again, if we took leadership of an open source API (essentially standardizing the existing codebases) then it could be made internally multi-user. Man, the more I think about it the more BrainTrust could really help Edam implement is strategy and he's supposedly kicking down money to bootstrap an open source ecosystem around StabilityAI world.

JohnTigue commented 1 year ago

But that's not possible with the v1 Workstation architecture :(

JohnTigue commented 1 year ago

Drat. The simple solution for jacking JypterLab into Docker turns out to be single user only. So, the hack version is to have JypterLab and Invoke in the same container? Phooey.

JohnTigue commented 1 year ago

Yet another reason for the v2.x.x architecture involving a message queue. Oy.

JohnTigue commented 1 year ago

Whelp, that is still a workable v1.x.x, just $$$ wasteful.

JohnTigue commented 1 year ago

Renaming this issue from "JupyterLab" to "Jupyter" because the "Lab" is really only one person. The "Hub" is exactly what they intended for something in our use case. Unlike the hacky "can't we all just get along" sharing of the SD web-ui, this is worse: we're all sharing the same front end and the backend Python engine, which doesn't work. C'est la vie.

JohnTigue commented 1 year ago

Once again this is begging for the v2 architecture. The JupyterHub containers could run on cheap VMs, and send off requests into the message queue for the render cluster to process.

JohnTigue commented 1 year ago

But for now, since I'm the only one using Jupyter… good enough for experimenting with moving Colab notebooks to AWS. But if we get to where multiple notebooks encapsulate workflows we develop, then we really need to step up to JupyterHub. But Lab not Hub is good enough for the super short term.

JohnTigue commented 1 year ago

An example of the above would be a model training process. We'll see about that soon enough. Looking forward to catching up with Anders. I've contributed nothing to his work except point at the gDrive where the 3D models are. I have no idea how he's doing training. First experiments don't need Jupyter for automation, yet.

JohnTigue commented 1 year ago

This is actually a cool idea for how to implement one part of the v2.x.x architecture. This would be a DIY multi-user JupyterHub environment with the least admin hassle on AWS: Serverless Jupyter Hub with AWS Fargate and CDK. Of course, perhaps one should just use SageMaker at that point. Neither solutions is relevant to v1.x.x. though which is really just SD desktops running in the cloud…

But if we were to open source BrainTrust…

JohnTigue commented 1 year ago

Third time I've seen this sort of thing. An old-school drawing tool, side by side with SD output, and synced somehow (in this case by Jupyter): Who needs Photoshop anyway?? MS Paint + SD AUTOMATIC1111 API + JUPYTER.

JohnTigue commented 1 year ago

Seems there are lots of Jupyter notebooks for making vidoes. Lots of folks even use Colab for free to pull it off. For example, published last month is A Free And Easy Way To Make AI Videos With Stable Diffusion.

JohnTigue commented 1 year ago

For reboots, would be nice to have the container start on reboot, a la:

sudo docker run -d -p 8888:8888 jupyter/scipy-notebook

Drop the sudo, natch.

JohnTigue commented 1 year ago

It's just a half-assed solution, but enabling a password will be easier to use than the rando token set-up currently running. (The token changes every reboot.)

JohnTigue commented 1 year ago

Using JupyterHub

JohnTigue commented 1 year ago

Oh, why, hello there: jupyter/base-notebook

Don't mind if I do.

JohnTigue commented 1 year ago

That does:

# Install Jupyter Notebook, Lab, and Hub

And has a provision for a config file:

COPY jupyter_notebook_config.py /etc/jupyter/
JohnTigue commented 1 year ago

Ah, it's "Base image for Jupyter Notebook stacks from https://github.com/jupyter/docker-stacks"

JohnTigue commented 1 year ago

Jupyter Docker Stacks — Docker Stacks documentation

JohnTigue commented 1 year ago

Selecting an Image:

Core Stacks The Jupyter team maintains a set of Docker image definitions in the jupyter/docker-stacks GitHub repository. The following sections describe these images, including their contents, relationships, and versioning strategy.

JohnTigue commented 1 year ago

Nice:

Options for a self-signed HTTPS certificate

I've been using jupyter/scipy-notebook from my nueroscience work but sounds like that's overkill – we just need jupyter/minimal-notebook.

JohnTigue commented 1 year ago

How do I run over HTTPS?:

To run the notebook server with a self-signed certificate, pass the --secure option to the up.sh script. You must also provide a password, which will be used to secure the notebook server. You can specify the password by setting the PASSWORD environment variable, or by passing it to the up.sh script.

PASSWORD=a_secret notebook/up.sh --secure
# or
notebook/up.sh --secure --password a_secret
JohnTigue commented 1 year ago

So, these Docker Stacks I've decided to use seem to want the config file to be a sibling of the Dockerfile:

# Currently need to have both jupyter_notebook_config and jupyter_server_config to support classic and lab
COPY jupyter_server_config.py docker_healthcheck.py /etc/jupyter/
JohnTigue commented 1 year ago

OK, there are two roles that Jupyter is playing

  1. Simply provide web-ui access to a file system navigator (a big plus for brain_trust)
  2. It's real job in live: run notebooks

For #1, Jupyter needs to be on the instance if it going to provide FS navigation.

For #2, if a "foreign" Jupyter can use the GPU, that would be sophisticated in terms of architecture. But what does that mean? Jupyter is using some web-ui embedded in it to provide UI? This sounds more like a v2.x.x thing where the notebook server would call on some brain_trust API to crunch data.

JohnTigue commented 1 year ago

Also, I cannot believe I only thought of this now, but the terminal inside Jupyter is the more appropriate way to get folks like Anders access to the command line. Derp.