Open phoenixor opened 2 weeks ago
There isn't enough space in your VRAM to load both t5 and flux simultaneously. If you modify the prompt, t5 needs to be loaded again to recalculate the conditioning. At this point, VAE and FLUX are unloaded to free up VRAM. If you skip the ksampling step and only create conditioning consecutively, there's no need to reload t5. And if you only perform the steps after KSampling using the pre-created conditioning, the phenomenon of VAE being reloaded will also disappear.
The long loading time suggests that your RAM is insufficient, causing swapping to occur. If both t5 and flux were properly loaded in RAM, switching between them would be instantaneous.
I'm getting this without the prompt being modified. The thing reloads each time as of the current build using Flux. Loads, spins up, finishes, unloads everything. Has to reload slowly yet again every single time I start it back up and I'm running it on the windows build with a 4090. I'm capping at about 18 out of 24 gigs vram and it does it anyway.
I'm getting this without the prompt being modified. The thing reloads each time as of the current build using Flux. Loads, spins up, finishes, unloads everything. Has to reload slowly yet again every single time I start it back up and I'm running it on the windows build with a 4090. I'm capping at about 18 out of 24 gigs vram and it does it anyway.
What is your --reserved-vram
setting?
I'm getting this without the prompt being modified. The thing reloads each time as of the current build using Flux. Loads, spins up, finishes, unloads everything. Has to reload slowly yet again every single time I start it back up and I'm running it on the windows build with a 4090. I'm capping at about 18 out of 24 gigs vram and it does it anyway.
What is your
--reserved-vram
setting?
Whatever the default is.
There isn't enough space in your VRAM to load both t5 and flux simultaneously. If you modify the prompt, t5 needs to be loaded again to recalculate the conditioning. At this point, VAE and FLUX are unloaded to free up VRAM. If you skip the ksampling step and only create conditioning consecutively, there's no need to reload t5. And if you only perform the steps after KSampling using the pre-created conditioning, the phenomenon of VAE being reloaded will also disappear.
The long loading time suggests that your RAM is insufficient, causing swapping to occur. If both t5 and flux were properly loaded in RAM, switching between them would be instantaneous.
This is the memory allocation situation of the ComfyUI application in my Ubuntu server. total used free shared buff/cache available Mem: 125Gi 29Gi 7.8Gi 29Mi 88Gi 95Gi Swap: 8.0Gi 28Mi 8.0Gi
What do I need to do to avoid reloading the AutoencodingEngine model?
I'm getting this without the prompt being modified. The thing reloads each time as of the current build using Flux. Loads, spins up, finishes, unloads everything. Has to reload slowly yet again every single time I start it back up and I'm running it on the windows build with a 4090. I'm capping at about 18 out of 24 gigs vram and it does it anyway.
It's flux fp8, and Is it correct that this phenomenon occurs simply by changing the seed?
Collaborator
No. Happens even when I leave the seed the same and have unfinished generations. The vram visually drops to nearly nothing and then loads about half instantly so about 10 gigs. After that it spins up the next 12 gigs until I'm at about 20-22 gigs vram total, then when it's done it dumps it all again.
The flux lora I'm training and testing currently does use both UNET and CLIP_L blocks, so that probably matters. I didn't train the T5, but there may be some sort of complexity I don't understand here. My guess is it has some sort of problem when loading the clip + unet due to T5 load/unload optimizations. Forge doesn't have any problems with it though.
So I noticed it wasn't happening with ALL other models. Primarily this one. When I load fp8 unet with the fp16 xxl as standalone it doesn't need to reload every time I change seeds.
https://civitai.com/models/637170/flux1-compact-or-clip-and-vae-included
Unless something was patched to make it more stable, in which case I'll hold my tongue.
So I noticed it wasn't happening with ALL other models. Primarily this one. When I load fp8 unet with the fp16 xxl as standalone it doesn't need to reload every time I change seeds.
https://civitai.com/models/637170/flux1-compact-or-clip-and-vae-included
Unless something was patched to make it more stable, in which case I'll hold my tongue.
In the case of fp16, the diffusion model alone already reaches 23.8GB, so it may be freeing up VRAM space to load the VAE. There is a shortage of VRAM due to consumption by other apps such as browsers. Additionally, ComfyUI attempts to secure a bit of extra VRAM to cover VRAM consumption that isn't accurately captured in its measurements.
So I noticed it wasn't happening with ALL other models. Primarily this one. When I load fp8 unet with the fp16 xxl as standalone it doesn't need to reload every time I change seeds. https://civitai.com/models/637170/flux1-compact-or-clip-and-vae-included Unless something was patched to make it more stable, in which case I'll hold my tongue.
In the case of fp16, the diffusion model alone already reaches 23.8GB, so it may be freeing up VRAM space to load the VAE. There is a shortage of VRAM due to consumption by other apps such as browsers. Additionally, ComfyUI attempts to secure a bit of extra VRAM to cover VRAM consumption that isn't accurately captured in its measurements.
This isn't an fp16 problem. Even when loading both unet fp8 and t5xxl_fp8 I end up with this same problem. It unloads both, and it's still doing it. It's gotten to the point where I might as well just switch to forge full time, since it actually works and this one is just patching itself into a laggier and laggier state.
I'm a fervent advocate for comfy but if it's just going to not do it's job and I'm being told it's a configuration problem, after some random patch that happened... Cmon I have a 4090 and 64 gigs of ram. This shouldn't be happening. This hardware is basically windows choice. The thing is sitting on an m.2 ssd, I have a 12 core cpu, I have more than enough power supplying the thing, and I'm running windows 10.
Lets be realistic. Why should I continue to advocate and use this product if it's multiplied the inference time by 8?
This was fixed like 2 months ago. Try downloading a fresh version of the latest standalone package.
If you still have issues at least post the logs.
Your question
I use an Ubuntu server. The hardware information is : cuda:0 Tesla T4 : cudaMallocAsync 15G I use the official sample workflow. The loader used is as follows.
every time i run the workflow, it will reload model AutoencodingEngine and takes 150 seconds to ouput the image. How can i fix this problem?
Logs
No response
Other
No response