Open martjay opened 3 weeks ago
I read somewhere where ComfyAnon has decided to make NF4 deprecated in favour of GGUF. I use NF4 myself, but haven't in the last few days (was nice and way faster than GGUF).
GGUF is a lot better, more flexible and will be faster than nf4 once someone ports the llama.cpp cuda kernels.
Your slowdown however is probably just because something else is using your vram.
GGUF is a lot better, more flexible and will be faster than nf4 once someone ports the llama.cpp cuda kernels.
Your slowdown however is probably just because something else is using your vram.
I don't know. I've seen that before starting to generate, there was almost 7GB VRAM. So I don't know if the change happened after updating Comfyui.
GGUF is a lot better, more flexible and will be faster than nf4 once someone ports the llama.cpp cuda kernels.
Your slowdown however is probably just because something else is using your vram.
any idea when the llama ccp kernles will be imported or is it just a sitting duck game?
GGUF is a lot better, more flexible and will be faster than nf4 once someone ports the llama.cpp cuda kernels. Your slowdown however is probably just because something else is using your vram.
I don't know. I've seen that before starting to generate, there was almost 7GB VRAM. So I don't know if the change happened after updating Comfyui.
Xformers was causing the issue for me. I only noticed a day or two after NF4 released. I thought it was a bug with NF4 but I had issues with across a lot of other workflows. Anytime I saw "Using xformers attention in VAE" it was like sitting at the DMV waiting for my number to be call.
I am only speaking from my experience and troubleshooting but hopefully one of the experts here confirm or deny my solution. Xformers removed support for PyTorch versions older than 2.2.0, so I updated PyTorch. Then installed the most current Xformers v0.0.27.post2. My gens are 4x faster than before.
Again, I have little understanding and using Comfy is my only experience with any type of coding but I hope this helps or points one of the gurus in the right direction for you.
GGUF is a lot better, more flexible and will be faster than nf4 once someone ports the llama.cpp cuda kernels. Your slowdown however is probably just because something else is using your vram.
I don't know. I've seen that before starting to generate, there was almost 7GB VRAM. So I don't know if the change happened after updating Comfyui.
Xformers was causing the issue for me. I only noticed a day or two after NF4 released. I thought it was a bug with NF4 but I had issues with across a lot of other workflows. Anytime I saw "Using xformers attention in VAE" it was like sitting at the DMV waiting for my number to be call.
I am only speaking from my experience and troubleshooting but hopefully one of the experts here confirm or deny my solution. Xformers removed support for PyTorch versions older than 2.2.0, so I updated PyTorch. Then installed the most current Xformers v0.0.27.post2. My gens are 4x faster than before.
Again, I have little understanding and using Comfy is my only experience with any type of coding but I hope this helps or points one of the gurus in the right direction for you.
hey friend anyway i can contact you perhaps via Discord?
It's not just NF4. Dev and Schnell have the same problem
GGUF is a lot better, more flexible and will be faster than nf4 once someone ports the llama.cpp cuda kernels. Your slowdown however is probably just because something else is using your vram.
I don't know. I've seen that before starting to generate, there was almost 7GB VRAM. So I don't know if the change happened after updating Comfyui.
Xformers was causing the issue for me. I only noticed a day or two after NF4 released. I thought it was a bug with NF4 but I had issues with across a lot of other workflows. Anytime I saw "Using xformers attention in VAE" it was like sitting at the DMV waiting for my number to be call.
I am only speaking from my experience and troubleshooting but hopefully one of the experts here confirm or deny my solution. Xformers removed support for PyTorch versions older than 2.2.0, so I updated PyTorch. Then installed the most current Xformers v0.0.27.post2. My gens are 4x faster than before.
Again, I have little understanding and using Comfy is my only experience with any type of coding but I hope this helps or points one of the gurus in the right direction for you.
Torch 2.3.1 (CUDA 12.1) + xFormers 0.0.27 not work for me
Expected Behavior
A few days ago, I used the flux nf4 model to generate an image in just 1 minute.
Actual Behavior
Steps to Reproduce
flux NF4 V2.json My node has not changed, but the generation speed has become very slow.
Debug Logs
Other
No response