Share your GGUF conversions of Flux finetunes !

JorgeR81 commented 1 month ago

Can we have a tread, to share and request GGUF conversions of the best Flux finetunes ?

This model is built on the Flux Dev (de-distill), but has good results, with "just" 30 steps ! It's from the creator of the Realistic Vision (SD1.5) and RealVisXL models ! https://huggingface.co/SG161222/Verus_Vision_1.0b https://civitai.com/models/883426/verus-vision-10b

If you've made a GGUF version, can you share a link here, please ?

I don't think this creator plans to make GGUF versions, because his other Flux models have been released for almost a month, and there are no GGUF versions yet. https://civitai.com/models/788550/realflux-10b https://huggingface.co/SG161222/RealFlux_1.0b_Schnell https://huggingface.co/SG161222/RealFlux_1.0b_Dev

Amazon90 commented 1 month ago

I also think the models by the author of Realistic Vision are great, but this repository should only quantize models released officially and won’t quantize third-party models.

This is an introduction to the repository: GGUF Quantization support for native ComfyUI models.

JorgeR81 commented 1 month ago

this repository should only quantize models released officially and won’t quantize third-party models.

Yeah, I agree that third-party models should probably not be in the main page, next to the official versions.

I was thinking more about users that make their own conversions and have them in their own huggingface pages. In that case, they could post the links in this thread, so that we can find them more easily.

For instance this user made its own GGUF conversions of another popular civitai model. https://huggingface.co/sherlockbt/acorn-is-spinning-flux-guff/tree/main

And I would also agree that we should talk with the finetune's creator, when sharing a conversion. Sometimes, they just don't know how to do the conversion ( as some creators have said on civitai comments ). And they would appreciate that help in promoting their work, so that more people can use their models.

In this case, the creator thanked this user for making the Q4_K_M version, and it's also sharing it, in its own page. https://civitai.com/models/673188?modelVersionId=978207

city96 commented 1 month ago

I think only really having base models is the right call, since those also act as a kind of "reference" to show how the converted models will work at certain quants/settings.

For collecting resources and things, would the github discussions page be useful @JorgeR81 ? I could enable it for the repo and it might be better than an issue since it can have replies to top level comments.

One thing I could try and experiment with is setting up a CI workflow similar to how the llama.cpp binaries are built (i.e. directly on github) so the releases section would have something like a tools.zip with the pre-build llama-quantize binary + convert script included, but that still leaves random edge cases like the diffusers VS reference state dict format (where you'd have to load the model in comfy first and save it).

Another thing that should be possible is allowing people to make the legacy quants (*_0/*_1) directly in ComfyUI, but the K quants would probably require using ggml.dll + some ctypes interface.

Lastly, I could try and spin up something like this but for image models, though I'd have to optimize the hell out of it because the free huggingface tier gives you like 16GBs of memory lmao: https://huggingface.co/spaces/ggml-org/gguf-my-repo

(Technically I could just spin that up locally or just automate conversion, but my upload speed is like 15mbps so I have to spin up a VPS for every quant and I'm on my last $2 of runpod credits lmfao)

JorgeR81 commented 1 month ago

For collecting resources and things, would the github discussions page be useful @JorgeR81 ?

Yes, that would be the ideal place !

Another thing that should be possible is allowing people to make the legacy quants (_0/_1) directly in ComfyUI, but the K quants would probably require using ggml.dll + some ctypes interface.

This could be a good solution. For FP16 models, we would probably use Q8_0, anyways. For FP8 models, we could use Q4_1, if Q4_K_S is not possible. But could we do it with only 8 GB VRAM and 32 GB RAM ?

JorgeR81 commented 1 month ago

By the way, the I user mentioned, named the converted models Q4_K_M and Q5_K_M https://huggingface.co/sherlockbt/acorn-is-spinning-flux-guff/tree/main

But, they are probably Q4_K_S and Q5_K_S, conversions, right ?

city96 commented 1 month ago

The Q*_K_M logic in the C++ code actually works for the most part lol. It's slightly better than the Q*_K_S quants but needs more work to have an actual meaningful effect.

I never got the use_more_bits logic working (which leaves the first and last block(s) in a higher precision, which I think is the main thing that makes Q*_K_M quants better. Main problem there is that we have 2 sets of blocks unlike the decoder-only LLMs lcpp was designed for, so we'd need to have 2 separate n_layers/i_layers variables tracking that. I guess I could just make a regex specific to flux and SD3 for that part and call it a day.).

Also, discussions should now be enabled if you want to start one for listing quantized models. I have this list for base models on HF in case you want to include it, I'll try to keep it updated: https://huggingface.co/collections/city96/gguf-image-model-quants-67199ef97bf1d9ca49033234

JorgeR81 commented 1 month ago

OK, I've opened a discussion thread, for model sharing: https://github.com/city96/ComfyUI-GGUF/discussions/144

I'm leaving this issue open, in case you want to discuss here, your other possible solutions. ( maybe change the title ? or just open a new one )

EvilBT commented 1 month ago

By the way, the I user mentioned, named the converted models Q4_K_M and Q5_K_M顺便说一下，I 用户提到的，将转换后的模型命名为 Q4_K_M 和 Q5_K_M https://huggingface.co/sherlockbt/acorn-is-spinning-flux-guff/tree/main

But, they are probably Q4_K_S and Q5_K_S, conversions, right ?但它们很可能是 Q4_K_S 和 Q5_K_S 的转换，对吧？

I'm sherlockbt, and I converted this model using the code from this repository, on a system with 8GB VRAM and 32GB RAM. 😄

JorgeR81 commented 1 month ago

Hi, @EvilBT. Thanks !

Let me know, if you also decide to convert any more models. You could share a link in the discussions here : ( #144 )

city96 / ComfyUI-GGUF

Share your GGUF conversions of Flux finetunes ! #141