buzsh / SwiftDiffusion

SwiftUI Stable Diffusion implementation using CoreML and PyTorch
GNU General Public License v3.0
149 stars 3 forks source link

refactor: v1.9.0 + scheduler, idle RAM management, Observation rewrite #88

Open buzsh opened 3 months ago

buzsh commented 3 months ago

Goals

With the existing setup, implementing new interface features from the A1111 backend requires a number of moving parts, along with a handful of modifications in different parts of the codebase. With this change, we look to merge all of these parts into one model. This will also bring us closer to the goal of direct API → SwiftUI translation for plugin components.

The current setup for all stable diffusion clients is as such: load the current model into RAM, use said model to generate existing prompt, leave model (and weights, prompt-dependencies, etc.) in memory until either: it is overridden by another model, or the process is shut down. This is beneficial for lower-end hardware, as it removes the need to reload model on each new prompt generation, saving anywhere from 30-90s in between prompts.

However, for higher-end hardware (especially the M3 Pro/Max), the time it takes to load an SDXL model into RAM usually takes a maximum of 2-3s. As a result, these clients will reserve 30-50GB of active memory for as long as the process is running—all to save this particular user a second or two of time (2-3s in worst case scenarios). Furthermore, you can restart the Python process, and load the previous model into RAM, which will only result in ~5GB of idle memory usage and add a measly 1-2s of time onto each generation queue.

As such, I propose two separate strategies that I plan to implement (as options) within SwiftDiffusion:

Setup Idle RAM usage (SDXL) Added time (per queue)
default (current) 30-50GB+ 0s
restartWithLoad 5-6GB 1-2s
startOnQueue 1-2GB 2-3s

After a generation queue has finished successfully:

On new generation queue:

Other Planned Improvements