Storage - Githubissues

JohnTigue commented 1 year ago

This is a hub issue for topics related to storage within the Brain Trust system.

For model storage see #56
For individual storage of generated images see (#73).
84
And what about S3 versus EFS?
Et cetra

JohnTigue commented 1 year ago

A PM does the contextualizing intro for the first 8 minutes then a coder gets into the nitty-gritty of S3 and EFS: Building Serverless, Modern Applications Using Amazon S3 or Amazon EFS.

JohnTigue commented 1 year ago

In the above talk, they use EFS – not S3 – to store the models and support libraries (in@~25min). That's for Lambda not ECS. No GPUs on Lambda. But the surprise for me is that they didn't use S3, which is how it would have been done a few years ago.

JohnTigue commented 1 year ago

Some model filenaming advice from the SD community: Model Naming Convention Proposal / Protip

JohnTigue commented 1 year ago

Another "storage" solution is Elastic Cache for fault-tolerant storing session data. This is another way for the EC2 instances to be stateless (ergo nothing gets lost when they go down. See AWS video on state managment.

JohnTigue commented 1 year ago

See also #56.

JohnTigue commented 1 year ago

Mounting a file system is the easiest way to work with current gen SD webUIs. But this might actually be a weakness. Doing storage in cloud native fashion would be cooler.

So, an intermediate solution (i.e. work with current gen solutions but architect for the future) would be to use S3 for storage but make it look like part of the "normal" file system. E.g. s3fs-fuse.

Longer term, explicitly using cloud object stores might be a better way to go. But that's for another day.

JohnTigue commented 1 year ago

This is an interesting option: Saving images in Automatic1111 to a folder per session?

JohnTigue commented 1 year ago

This is interesting. A build in feature of Auto1111 is to save files to a new folder every day. Nice. So, how to get that to work with individual storage for a group of users, not just one user? Automatic1111: Saving to a Directory

JohnTigue commented 1 year ago

Looks like ComfyUI can also be set to use the same models library, "Config file to set the search paths for models:" https://github.com/comfyanonymous/ComfyUI/blob/master/extra_model_paths.yaml.example

JohnTigue commented 1 year ago

InvokeAI Stable Diffusion Toolkit Docs InvokeAI Web Serve:

A gallery section on the left that contains a history of the images you have generated. These images are read and written to the directory specified at launch time in --outdir.

JohnTigue commented 1 year ago

See https://github.com/ManyHands/hypnowerk/issues/74#issuecomment-1507660589 for why things are going to need to evolve a bit on this front.

JohnTigue commented 1 year ago

OK, it looks like this is the way forward: Using data volumes in tasks. It is an AWS ECS only thing but AWS-only is the current design goal context.

It looks like they tried to make it fit Docker's native style. IF someone were to want to make Hypnowerk eventually work in a non-AWS Docker context, it looks like it would be a matter of defining volumes: in a docker-compose.yml. Feels like the right choice for now and for potential futures. See ECS Documentation: Docker volumes for detailed how-to.

JohnTigue commented 1 year ago

Really should split storage of gens and models. I think. Or maybe there should be two model locations: the servers collection of models and personal one? Meh. Nonetheless, single storage for gens and models seems wrong.

JohnTigue commented 1 year ago

Karma's VersesTest.rar file has been added to the S3 bucket, hypnowerk-archiv.

JohnTigue commented 1 year ago

As per third comment back, AWS did nice work shoehorning data volumes into the Docker way of doing things. We've got a situation where we really want an ECS cluster with data volumes for Invoke and Jupyter, but we also separately need to have an Automatic1111 deploy running. It would be nice if the both had access to the same model library (many models are ~6GiB each). See #56 for more details.

JohnTigue commented 1 year ago

I guess there needs to be three chunks of storage

System-wide: including the Model Library (#56)
Group-wide: thinks like Karma's VersesTest.rar and other corp-internal goodies
Individual-wide: a personal storage bin, one for each user

JohnTigue commented 1 year ago

Actually, if we config'd the Auto1111 instances to put the gen'd images on the EBS volume, this would be a form of persistent storage wherein the gens would be available across reboots.

JohnTigue commented 1 year ago

A-ha! EFS is what I've been missing, according to S3, EBS, EFS Explained:

AWS EFS is a shared, elastic file storage system that grows and shrinks as you add and remove files. It offers a traditional file storage paradigm, with data organized into directories and subdirectories. EFS is useful for SaaS applications and content management systems. You can mount EFS onto several EC2 instances at the same time.

JohnTigue commented 1 year ago

Another bit of SD on Docker prior art: universonic/stable-diffusion-webui - Docker Image | Docker Hub

Where to Store Data It is recommended to create a data directory on the host system (outside the container) and mount this to a directory visible from inside the container.

MountaintopLotus / braintrust

Storage #66

84