Open Michlozz opened 4 weeks ago
What resolution are you using?
Also, which exactly model are you using: fp16, fp8, nf4, GGUF variants?
What resolution are you using?
Also, which exactly model are you using: fp16, fp8, nf4, GGUF variants?
fp8, resolution of 1024X1024
fp8, resolution of 1024X1024
This is normal speed on 8GB VRAM and fp8, try GGUF quantized models
For me (RTX 2060 Super, 8GB VRAM) it takes 39 seconds for 1920x1080 image on schnell Q4_0 model (4 steps)
Your question
I got:
Total VRAM 8188 MB, total RAM 16011 MB pytorch version: 2.3.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Laptop GPU : cudaMallocAsync Using pytorch cross attention [Prompt Server] web root: C:\Users\ursus\Documents\stable-diffusion-webui\ComfyUI_windows_portable2\ComfyUI\web
Loading: ComfyUI-Manager (V2.50.1)
ComfyUI Revision: 2563 [2622c55a] | Released on '2024-08-18'
with schinell (4 steps) it takes sometimes more then 2 minutes per generation. with dev (20 steps) it can take more then 400 seconds per one image. If I try more then one image per run, it sometimes crashes.
what slows it down so much? My comp isn't that bad.
Logs
No response
Other
No response