IndigoDosSantos / stable-cascade-one-click-installer

Easy setup for generating beautiful images with Stable Cascade.
GNU Affero General Public License v3.0
70 stars 8 forks source link

Optimization issues causing slow speed #32

Open Loko415 opened 6 months ago

Loko415 commented 6 months ago

Why are you using bfp16? From my research this is causing slower speed and is often used for training instead? Also why are you offloading to CPU? And why are you freeing cuda memory each time? This just makes the model reload each time. I may be completely wrong idk much about ai but I find it strange. Can you explain this to me? How much vram does this model use. 12? Maybe it's using too much vram I only got a 3060 with 12gb ram maybe it's going into swap ram sys ram that's why it's slower than sdxl.

IndigoDosSantos commented 6 months ago

Why 🧠float16 (bfp16)? I'm using bfloat16 due to its smaller memory requirement. Although bfloat16 provides less precision compared to float32 or half-precision float (fp16), it retains the extensive range needed to represent very large or tiny numbers. In diffusion models like Stable Diffusion and Stable Cascade, this compromise in precision is generally acceptable.

Why CPU offloading? Offloading models to the CPU helps preserve valuable VRAM, especially for users lacking top-tier GPUs such as the 4090.

Why clear CUDA memory? Clearing CUDA memory after each process (prior and decoder) helps reduce VRAM consumption. Since Stable Cascade operates in a sequential manner, it's unnecessary to keep both the prior and decoder models in VRAM at the same time. This strategy lessens memory demands, particularly on systems with limited VRAM capacity.

VRAM Usage: My configuration necessitates approximately 11GB of VRAM with models offloaded.

Comparison to SDXL: Indeed, Stable Cascade doesn't match SDXL in speed. Due to its more compact size, SDXL can typically remain fully loaded in VRAM, avoiding the loading delays that come with offloading.

Btw, found a few lines in the loading process that are not needed, even slow the model loading process further down. Deleted them and this is the, although small, (but we take what we get, aren’t we? 😅) impact:

diagram_issue-32

Loko415 commented 5 months ago

Very nice ☺️

Loko415 commented 5 months ago

Should have preset high vram low vram etc incase their GPU can take it 👍