Open xueqing0622 opened 2 weeks ago
And easy to get error in ClipTextEncoder
https://github.com/user-attachments/assets/a21073e6-4c3a-454b-87f7-16282a21aba5
i compare q8 with fp8 there's not that difference plus q8 has better quality , q-k-m and k versions at all need more process , if you prefer speed than quality go with normal gguf versions instead of K versions
thx for your answer!
output.mp4 i compare q8 with fp8 there's not that difference plus q8 has better quality , q-k-m and k versions at all need more process , if you prefer speed than quality go with normal gguf versions instead of K versions
For 12GB VRAM, would you recommend using the "t5xxl_fp16.safetensors" which 9.11GB ? Since model swapping is going to take place either ways, and Load/Unload time for difference is very little. I mean if it can fit in the VRAM then that would give the best quality of all.
I found that the DualClipLoader (GGUF) uses the CPU to run the clip process, which makes the overall speed unusually slow
I found that the DualClipLoader (GGUF) uses the CPU to run the clip process, which makes the overall speed unusually slow
even I use the --highvram to start the comfyui
I think it's okay , since you will load t5 one time only , at beginning,
Not only at beginning, when running the t5 in cpu is very slow, I force it load to gpu, and it run much quicker(about 8s vs <0.1s)
I can reproduce the CPU thing when I run it on my second GPU but haven't had the time to look into it, I'll try to on the weekend.
Use t5_v1.1-xxl GGUF waste double times than t5xxl_fp8_e4m3fn in ClipTextEncoder t5xxl_fp8_e4m3fn 8s t5_v1.1-xxl GGUF 19s 3060 12G