Closed al-swaiti closed 1 week ago
You most likely disabled all the key checks and converted the entire checkpoint including CLIP and VAE as well instead of just using the UNET. Extract just the SDXL UNET in the diffusers format and save it to safetensors if you really want to convert it, though you most likely won't gain any real benefit from it with a non-transformer model.
model = diffusers.UNet2DConditionModel.from_single_file(some_model_path.safetensors)
save_file(model.state_dict(), some_unet_path.safetensors)
here i already convert it to gguf after i seperate the unet , used this https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/quantization_and_gguf.md to convert to gguf , the issue was the resulted model its not supported by node @city96
Ah yeah, they're using a different format from what we are for conv2d stuff. We have some initial SDXL support going but we're actually storing the original shape as a separate key, will have to look how they're doing this and see if we can support it though it's not a massive priority atm or anything. Reopening and changing the title.
Pinging @blepping since he worked on our SDXL implementation here https://github.com/city96/ComfyUI-GGUF/pull/63 in case this is something he wants to look into.
Pinging @blepping since he worked on our SDXL implementation here #63 in case this is something he wants to look into.
i actually looked at stable-diffusion.cpp
and was all set to say "hey, let's use this for converting and skip the having to patch llama.cpp
stuff" but it seemed like they did some stuff differently (including key names).
definitely would be good to be compatible though, maybe it's as simple as having a key conversion table. i'll take a closer look when i get a chance.
progress in #80 - that implementation seems to work. more testing would be helpful.
theoretically should work for Flux too (stable-diffusion.cpp
claims to support it). i didn't test that, it may require adding more ops if they quantize layer types that ComfyUI-GGUF
currently doesn't which was the case with SD15 at least.
https://github.com/user-attachments/assets/be6ef0b0-06a6-48eb-bf67-2edfe6c6548d
i extract unet model from my (sdxl model) then apply convert,py , to convert it to bf 16 , its work perfect
@city96 @blepping , i tried to quantize it using llama.cpp i face this error
i converted the same unet model using https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/quantization_and_gguf.md successfuly to Q8_0 but the gguf loader give me this error Error occurred when executing UnetLoaderGGUF:
the result of using model i quantized using your method after patch the time 8 min llama/cpp through https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md
same result using his way of quantization
comparison between Q8 quantization between his method and yours method (both supported on gguf loader) (2 step 10s take on comfyui (used my special merge)
https://github.com/user-attachments/assets/4e31c42f-0d90-4dfa-9054-38ba3909d647
the closest solution apply patch to llama.cpp to support stablediffusion and others types (unet) ,,,,(comfyui very fast ) the long term solution build application that support gguf types (mix stable.cpp techneque , with speed of comfyui )
We don't have bf16 dequantization kernels, which is the reason you're seeing those times.
Please post logs as a collapsible markdown (details) block to avoid spam in discussions/issues. I recommend you edit your posts above.
https://github.com/user-attachments/assets/17ac8866-fdc6-478b-8093-90a9fc8b508c
https://github.com/user-attachments/assets/b8e0fdce-1cb5-4104-b6d4-607673bad869
i will try another versions now success ! after patch and quantized @city96 @blepping
a problem with sd3
a problem with sd3
as far as i know, SD3 was never supported. so far there's only support for Flux, SD 1.5 and SDXL. i think SD3 is similar to Flux in terms of architecture so maybe it would be easy to add.
any line of code for cpp patch @blepping
a problem with sd3
i bypass this by now i have gguf-bf16 but , its not supported by cpp , so i tried to redit the patch (it was 17 years ago last time i used c" reach this code with error
the code ,, ignore spelling and forgetten replacment ,
https://github.com/user-attachments/assets/7c5172f5-1a79-45b9-bd59-3f4842e9a99c
why sd3 important i patched it before to create image in 4 step ,,, the half of community interested of video creation almost they using sd1.5 , it will be amazing to use sd3 for that
I have SD3 and cascade support working, but I will have to add an exception to keep 4D tensors as actual 4D without reshaping since there's very few of them and adding our key logic on top seems pointless in those cases (better to keep it more standard).
there's a big list , of models need to be converted to gguf ,svd, kolors, controlnet , ipadapter ! this is like translate english to another language (gguf)
svd
Mediocre 3 GB model, probably pointless. (Also, pretty sure most of the VRAM requirements with video models are from inference, not the model weights).
kolors
Not supported in ComfyUI natively
controlnet
This one would make more sense, depends on how it's applied internally
ipadapter
Relatively small, overhead makes it pointless
its calculated by summation of all models inside workflows like this one i used more than one model (base +ipadapter +controlnet)
its calculated by summation of all models inside workflows like this one
you don't have to use all GGUF models. it basically only makes sense to quantize large models, so quantizing (or in other words using GGUF format) for controlnet is going to reduce quality without actually providing a benefit.
if i were you ,,, i will look @ GGUF more than Flux ,,, flux is temprory model ! from my experience of using controlnet and loras the quality not affected that much also from my experience of community feedback they prefer smaller size than quality , if you measure the percentage of people who cant use loras nor controlnet with flux model because of the size , it will be 99% because of out of memory ! one day i uploaded vae file model 360 Mb, there some users ask me to upload another version of 150 Mb !!!! i think users think of Mb as they thinking of money ! ๐
Absolutely not. We're not going to create that amount of massive overhead for other developers by allowing people to quantize literally anything.
Think about your VAE example for even a second. It's a 360MB static cost for VRAM due to the model weights. You've reduced it to 150MB. The runtime/inference VRAM cost didn't change - VAE decoding a 1920x1080 image will still use about ~6.2GBs of VRAM on top of the weights.
LoRAs also make no sense in our case since we're storing them in system RAM and loading them in weight by weight. The only thing quantizing them would do is slow down inference speed even further.
Flux may be temporary but large, transformer based models are here to stay, and they will continue to be the main focus of this project.
but the ram hold the size , and it play critical role in this process , in comfyui they distribute the models process between cpu and vram big models need big ram , i missed this ! . because out of memory (ram)! , whatever i'll return to teach physics in school i haven't found any job in this domain ...bad luck !
The RAM usage can still be cut in half because currently ComfyUI makes a copy of the quantized weights when a LoRA is applied even when there's no need for it, so a lot of that is just wasted atm. Ideally, a 300MB LoRA would take 300MB of system RAM and very little extra VRAM to use.
should be able to close this now - support for stable-diffusion.cpp
SD15, SDXL and Flux models was merged.
๐๐thanks for support see u
On Tue, 3 Sept 2024, 09:08 blepping, @.***> wrote:
should be able to close this now - support for stable-diffusion.cpp SD15, SDXL and Flux models was merged.
โ Reply to this email directly, view it on GitHub https://github.com/city96/ComfyUI-GGUF/issues/53#issuecomment-2325675239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW2H23LYZOALQEICF3K5XMDZUVG4ZAVCNFSM6AAAAABM243AT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRVGY3TKMRTHE . You are receiving this because you authored the thread.Message ID: @.***>
i made sdxl gguf model ,, unfournutly the node not support it !