Open JorgeR81 opened 1 month ago
I also think the models by the author of Realistic Vision are great, but this repository should only quantize models released officially and won’t quantize third-party models.
This is an introduction to the repository: GGUF Quantization support for native ComfyUI models.
this repository should only quantize models released officially and won’t quantize third-party models.
Yeah, I agree that third-party models should probably not be in the main page, next to the official versions.
I was thinking more about users that make their own conversions and have them in their own huggingface pages. In that case, they could post the links in this thread, so that we can find them more easily.
For instance this user made its own GGUF conversions of another popular civitai model. https://huggingface.co/sherlockbt/acorn-is-spinning-flux-guff/tree/main
And I would also agree that we should talk with the finetune's creator, when sharing a conversion. Sometimes, they just don't know how to do the conversion ( as some creators have said on civitai comments ). And they would appreciate that help in promoting their work, so that more people can use their models.
In this case, the creator thanked this user for making the Q4_K_M version, and it's also sharing it, in its own page. https://civitai.com/models/673188?modelVersionId=978207
I think only really having base models is the right call, since those also act as a kind of "reference" to show how the converted models will work at certain quants/settings.
For collecting resources and things, would the github discussions page be useful @JorgeR81 ? I could enable it for the repo and it might be better than an issue since it can have replies to top level comments.
One thing I could try and experiment with is setting up a CI workflow similar to how the llama.cpp binaries are built (i.e. directly on github) so the releases section would have something like a tools.zip
with the pre-build llama-quantize
binary + convert script included, but that still leaves random edge cases like the diffusers VS reference state dict format (where you'd have to load the model in comfy first and save it).
Another thing that should be possible is allowing people to make the legacy quants (*_0
/*_1
) directly in ComfyUI, but the K quants would probably require using ggml.dll + some ctypes interface.
Lastly, I could try and spin up something like this but for image models, though I'd have to optimize the hell out of it because the free huggingface tier gives you like 16GBs of memory lmao: https://huggingface.co/spaces/ggml-org/gguf-my-repo
(Technically I could just spin that up locally or just automate conversion, but my upload speed is like 15mbps so I have to spin up a VPS for every quant and I'm on my last $2 of runpod credits lmfao)
For collecting resources and things, would the github discussions page be useful @JorgeR81 ?
Yes, that would be the ideal place !
Another thing that should be possible is allowing people to make the legacy quants (_0/_1) directly in ComfyUI, but the K quants would probably require using ggml.dll + some ctypes interface.
This could be a good solution. For FP16 models, we would probably use Q8_0, anyways. For FP8 models, we could use Q4_1, if Q4_K_S is not possible. But could we do it with only 8 GB VRAM and 32 GB RAM ?
By the way, the I user mentioned, named the converted models Q4_K_M
and Q5_K_M
https://huggingface.co/sherlockbt/acorn-is-spinning-flux-guff/tree/main
But, they are probably Q4_K_S
and Q5_K_S
, conversions, right ?
The Q*_K_M
logic in the C++ code actually works for the most part lol. It's slightly better than the Q*_K_S
quants but needs more work to have an actual meaningful effect.
I never got the use_more_bits
logic working (which leaves the first and last block(s) in a higher precision, which I think is the main thing that makes Q*_K_M
quants better. Main problem there is that we have 2 sets of blocks unlike the decoder-only LLMs lcpp was designed for, so we'd need to have 2 separate n_layers
/i_layers
variables tracking that. I guess I could just make a regex specific to flux and SD3 for that part and call it a day.).
Also, discussions should now be enabled if you want to start one for listing quantized models. I have this list for base models on HF in case you want to include it, I'll try to keep it updated: https://huggingface.co/collections/city96/gguf-image-model-quants-67199ef97bf1d9ca49033234
OK, I've opened a discussion thread, for model sharing: https://github.com/city96/ComfyUI-GGUF/discussions/144
I'm leaving this issue open, in case you want to discuss here, your other possible solutions. ( maybe change the title ? or just open a new one )
By the way, the I user mentioned, named the converted models
Q4_K_M
andQ5_K_M
顺便说一下,I 用户提到的,将转换后的模型命名为Q4_K_M
和Q5_K_M
https://huggingface.co/sherlockbt/acorn-is-spinning-flux-guff/tree/mainBut, they are probably
Q4_K_S
andQ5_K_S
, conversions, right ?但它们很可能是Q4_K_S
和Q5_K_S
的转换,对吧?
I'm sherlockbt, and I converted this model using the code from this repository, on a system with 8GB VRAM and 32GB RAM. 😄
Hi, @EvilBT. Thanks !
Let me know, if you also decide to convert any more models. You could share a link in the discussions here : ( #144 )
Can we have a tread, to share and request GGUF conversions of the best Flux finetunes ?
This model is built on the Flux Dev (de-distill), but has good results, with "just" 30 steps ! It's from the creator of the
Realistic Vision
(SD1.5) andRealVisXL
models ! https://huggingface.co/SG161222/Verus_Vision_1.0b https://civitai.com/models/883426/verus-vision-10bIf you've made a GGUF version, can you share a link here, please ?
I don't think this creator plans to make GGUF versions, because his other Flux models have been released for almost a month, and there are no GGUF versions yet. https://civitai.com/models/788550/realflux-10b https://huggingface.co/SG161222/RealFlux_1.0b_Schnell https://huggingface.co/SG161222/RealFlux_1.0b_Dev