Use t5_v1.1-xxl GGUF waste double times than t5xxl_fp8_e4m3fn in ClipTextEncoder

city96 / ComfyUI-GGUF

GGUF Quantization support for native ComfyUI models

Apache License 2.0

709 stars 41 forks source link

Use t5_v1.1-xxl GGUF waste double times than t5xxl_fp8_e4m3fn in ClipTextEncoder #83

Open xueqing0622 opened 2 weeks ago

xueqing0622 commented 2 weeks ago

Use t5_v1.1-xxl GGUF waste double times than t5xxl_fp8_e4m3fn in ClipTextEncoder t5xxl_fp8_e4m3fn 8s t5_v1.1-xxl GGUF 19s 3060 12G

xueqing0622 commented 2 weeks ago

And easy to get error in ClipTextEncoder

al-swaiti commented 2 weeks ago

https://github.com/user-attachments/assets/a21073e6-4c3a-454b-87f7-16282a21aba5

i compare q8 with fp8 there's not that difference plus q8 has better quality , q-k-m and k versions at all need more process , if you prefer speed than quality go with normal gguf versions instead of K versions

xueqing0622 commented 2 weeks ago

al-swaiti

thx for your answer!

ViratX commented 1 week ago

output.mp4 i compare q8 with fp8 there's not that difference plus q8 has better quality , q-k-m and k versions at all need more process , if you prefer speed than quality go with normal gguf versions instead of K versions

For 12GB VRAM, would you recommend using the "t5xxl_fp16.safetensors" which 9.11GB ? Since model swapping is going to take place either ways, and Load/Unload time for difference is very little. I mean if it can fit in the VRAM then that would give the best quality of all.

wujohns commented 4 days ago

I found that the DualClipLoader (GGUF) uses the CPU to run the clip process, which makes the overall speed unusually slow

wujohns commented 4 days ago

I found that the DualClipLoader (GGUF) uses the CPU to run the clip process, which makes the overall speed unusually slow

even I use the --highvram to start the comfyui

al-swaiti commented 4 days ago

I think it's okay , since you will load t5 one time only , at beginning,

wujohns commented 3 days ago

Not only at beginning, when running the t5 in cpu is very slow, I force it load to gpu, and it run much quicker(about 8s vs <0.1s)

city96 commented 3 days ago

I can reproduce the CPU thing when I run it on my second GPU but haven't had the time to look into it, I'll try to on the weekend.