refactor: v1.9.0 + scheduler, idle RAM management, Observation rewrite

Goals

[ ] Model management, editor unification
- [ ] 1-model setup: merge StoredPrompt and PromptModel, child models
- [ ] SwiftData @Model accessors for PromptView
- [ ] SwiftData modern store + save management

With the existing setup, implementing new interface features from the A1111 backend requires a number of moving parts, along with a handful of modifications in different parts of the codebase. With this change, we look to merge all of these parts into one model. This will also bring us closer to the goal of direct API → SwiftUI translation for plugin components.

[ ] Observation framework (see Apple docs)
- [ ] Remove need for Combine: ObservableObject, @Published, ...
- [ ] Inherently allow for improved state management
[ ] A1111 v1.9.0 compatibility (see stable-diffusion-webui releases)
- [ ] Dynamic sampling methods+associated schedulers from API
- [ ] Implement scheduler with default .automatic
[ ] (Optional) Idle RAM management for powerful hardware
- [ ] Start Python process for a queue of tasks (load model, start generation process, etc.)
- [ ] End Python process on queue completion (see --api-server-stop in A1111 CLI args)
- [ ] Revert Python process to baseline initialization

The current setup for all stable diffusion clients is as such: load the current model into RAM, use said model to generate existing prompt, leave model (and weights, prompt-dependencies, etc.) in memory until either: it is overridden by another model, or the process is shut down. This is beneficial for lower-end hardware, as it removes the need to reload model on each new prompt generation, saving anywhere from 30-90s in between prompts.

However, for higher-end hardware (especially the M3 Pro/Max), the time it takes to load an SDXL model into RAM usually takes a maximum of 2-3s. As a result, these clients will reserve 30-50GB of active memory for as long as the process is running—all to save this particular user a second or two of time (2-3s in worst case scenarios). Furthermore, you can restart the Python process, and load the previous model into RAM, which will only result in ~5GB of idle memory usage and add a measly 1-2s of time onto each generation queue.

As such, I propose two separate strategies that I plan to implement (as options) within SwiftDiffusion:

Setup	Idle RAM usage (SDXL)	Added time (per queue)
`default` (current)	30-50GB+	0s
`restartWithLoad`	5-6GB	1-2s
`startOnQueue`	1-2GB	2-3s

After a generation queue has finished successfully:

restartWithLoad: end Py process, start Py process, load last model into RAM
startOnQueue: end Py process, start Py process with no loaded model

On new generation queue:

restartWithLoad: make generation request
startOnQueue: load model into RAM, make generation request

Other Planned Improvements

[ ] (Maybe) Ship with self-contained stable-diffusion-webui release (exclude repository, venv)
- [ ] Manually build repository, venv on first launch
[ ] Rewrite ScriptManager / PythonProcess implementations
- [ ] Model LoadStates and GenStates to their own respective classes

buzsh / SwiftDiffusion

refactor: v1.9.0 + scheduler, idle RAM management, Observation rewrite #88

Goals

Other Planned Improvements