Quick and dirty hack to test Flux UNET models loading (GGUF & NF4 supported, LoRA too)

Danamir commented 2 weeks ago

The development of full Flux support will take some time since the better way to load it may be to use a separate UNET, and will complicate the UI to be able to define the CLIP, T5 and VAE separately, and handle the UNET models detection.

In the meantime, here is a dirty hack to test the loading of UNET models with separated CLIP, T5 and VAE.

First create a symbolic link to an existing flux safetensors full checkpoint (a normal one, not an unet or gguf one) in your models, or create a copy if you don't know how to make a symbolic link (the content is needed only to be detected correctly by the plugin).
Rename it this way : __unet__<unet_filename>.safetensors . The <unet_filename> corresponding to the name of a unet model present in the ComfyUI/models/unet folder. (If you have subdirectories, replace the path separator by __ in the unet filename). If the unet has a different extension than .safetentors, be sure to add it.
- For example to refer to my unet GGUF Q6 model available in my ComfyUI installation as flux1-dev-Q6_K.gguf the dummy file must be named __unet__flux1-dev-Q6_K.gguf.safetensors .
- Same thing for my unet NF4 model : __unet__flux1-dev-bnb-nf4.safetensors

Then we alter the checkpoint loading code to look for the __unet__ part and if found replace the simple checkpoint loader by the necessary loaders. To do so, replace the def load_checkpoint(self, checkpoint: str): method by this code :

    def load_checkpoint(self, checkpoint: str):
        if "__unet__" in checkpoint:
            # -Configuration-
            clip_l_name = "clip_l.safetensors"
            t5xxl_name = "t5-v1_1-xxl-encoder-Q8_0.gguf"  # GGUF T5XXL
            # t5xxl_name = "t5xxl_fp8_e4m3fn.safetensors"  # Standard T5XXL
            vae_name = "ae.sft"

            # UNET loading
            import os
            unet_name = os.path.basename(checkpoint)
            unet_name = unet_name.replace("__unet__", "")
            unet_name = os.path.join(*unet_name.split("__"))  # handle unet subdirectories

            if ".gguf" in unet_name:
                unet_name = unet_name.replace(".gguf.safetensors", ".gguf")
                model_output = self.add("UnetLoaderGGUF", 1, unet_name=unet_name)
            elif "nf4" in unet_name:
                model_output = self.add("UNETLoaderNF4", 1, unet_name=unet_name)
            else:
                model_output = self.add("UNETLoader", 1, unet_name=unet_name)

            # CLIP loading
            model_type = "flux" if "flux" in unet_name else "sdxl"  # detect model type

            if clip_l_name.endswith(".gguf") or t5xxl_name.endswith(".gguf"):
                clip_output = self.add("DualCLIPLoaderGGUF", 1, clip_name1=clip_l_name, clip_name2=t5xxl_name, type=model_type)
            else:
                clip_output = self.add("DualCLIPLoader", 1, clip_name1=clip_l_name, clip_name2=t5xxl_name, type=model_type)

            # VAE loading
            vae_output = self.add("VAELoader", 1, vae_name=vae_name)

            return (model_output, clip_output, vae_output)

        return self.add_cached("CheckpointLoaderSimple", 3, ckpt_name=checkpoint)

Be sure to edit the -Configuration- part of code to point to your relevant CLIP, T5 and VAE separated files.

You'll also have to install the custom_nodes needed for the loading of GGUF or NF4 unet, or GGUF CLIP (ie. https://github.com/city96/ComfyUI-GGUF and https://github.com/DenkingOfficial/ComfyUI_UNet_bitsandbytes_NF4 ).

At the moment, the LoRA are working with the GGUF UNET models. There is still a bug affecting the lowram loading, so if this is not working try a smaller quantization. For example on my system with 8GB VRAM the LoRA does not work with the Q8 version but works with the Q6 and Q4 versions.

The NF4 version does not work with LoRA.

coastbluet commented 2 weeks ago

Thanks very much for this, but I didn't understand which file or files I need to alter with the code you wrote?

guijiaosir commented 2 weeks ago

i dont know what do u say......Are you saying it is possible to make the FLUX model use the SDXL LoRa?

Danamir commented 2 weeks ago

Thanks very much for this, but I didn't understand which file or files I need to alter with the code you wrote?

comfy_workflow.py, line 304. Or search the project for def load_checkpoint. But I invite you to alter the code if and only if you know how to rollback the changes when needed. 😅

Danamir commented 2 weeks ago

i dont know what do u say......Are you saying it is possible to make the FLUX model use the SDXL LoRa?

Not at all, SDXL LoRA are totally incompatible with Flux.

I was only talking about the many Flux LoRA available right now, but only working with certain models versions on ComfyUI (Forge do it's own patching method).

derdelush commented 2 weeks ago

When I edit comfy_workflow.py and restart Krita the AI Image Generation plugin disappears. I only replaced 304-305 with the above code.

nevermind, identation error...

I do keep getting name 'os' is not defined though

nevermind it's late, added import os at the top of the comfy_workflow.py file

Danamir commented 2 weeks ago

nevermind it's late, added import os at the top of the comfy_workflow.py file

Damn you're right I forgot to mention it. I'll edit the first post.

derdelush commented 2 weeks ago

No worries, also thank you it works great. I can generate images then inpaint with SDXL.

Acly commented 2 weeks ago

I added auto-detection for diffusion models (aka. unet) few days ago: https://github.com/Acly/comfyui-tooling-nodes/commit/f5ec9d830cc78ec766616627fc1951cdb4897413

The plan would be to make diffusion models appear in the same list as checkpoints. And clip/vae can be detected & loaded automatically, similar to how it works for SD3.

But this doesn't cover GGUF/NF4 quants yet which require custom nodes. Probably will wait a bit longer to see if one of them is integrated a bit more deeply and some of the incompatibilities are smoothed out.

Danamir commented 2 weeks ago

Nice feature if the checkpoints and unet can be unified.

For now the fastest is NF4 but it does not support LoRA and the memory gain does not scale well with at higher resolutions. GGUF is much more stable and seems like the logical choice, but it can be slower particularly with the first generation or prompt change. And there sill is this nasty bug affecting the LoRA with lowram loading.

Of course this is mostly affecting systems with "low" VRAM (can't believe that 8GB is low nowadays 😅), those with more VRAM are less dependent on such optimizations.

Wait-n-see seems to be a perfect approach in this case, my patch is for the (impatient) enthusiasts only. 😬

redclif43 commented 2 weeks ago

How to implement this in krita software not in comfyui workflow?

yunuskoc005 commented 2 weeks ago

Thank you for the hack. I just managed to try it on a kaggle notebook.

I first had a connection problem, i think because I was trying "zrok" and "pinggy" for remote access. That issue is solved when I tried ngrok.

Then installing the models (https://github.com/Acly/krita-ai-diffusion/wiki/ComfyUI-Setup) take me a some time. I tried to use the script "download_models.py" from krita-ai-diffusion/scripts/download_models.py, but this time as it seems main is not including the recent release of "ai_diffusion" folder and "ai_diffusion.desktop". After delete and replace it with the recent release (https://github.com/Acly/krita-ai-diffusion/releases), it worked nicely (only disk space issues).

After that I managed to communicate comfyui and the local krita plugin but couldn't see Flux models in the model list. I tried many combinations of Flux models (flux dev .safetensors full checkpoint (BFL), gguf version of the Flux schnell, different encoders gguf and normal) but couldn't make it work.

part below is wrong, I make an edit about this in the bottom.

Then I noticed or understand (or I suppose I understand at least :) ) error was about "Acly/comfyui-tooling-nodes". When I put a Flux model .safetensor to pass safety check "comfyui-tooling-nodes" give some errors about "ComfyUI/comfy/supported_models_base.py". I think this is because it is a Flux model. Because I noticed that error only manifest when there is a Flux model in models/checkpoints folder.

Later I made the dummy "unet.safetensors" file from not a Flux model as @Danamir mentioned but instead from an SDXL model (zavychromaxl_v90) and finally it worked. I managed to run "flux1-schnell-Q6_K.gguf" with a "t5-v1_1-xxl-encoder-Q6_K.gguf" and normal clip_l.

Sorry if I write here unnecessarily long, as you may notice I'm not very proficient in this issues :) ; but I wanted to write in a bit detail in case somebody else have similar problems (or just wait a few more days and probably contributors of this project would solve them all).

Thank you all again.

Edit 1: I noticed I made a few mistake and quoted that part from the post. Probably the model I managed to run was the link itself, zavychromaxl_v90 the model I used as a dummy file, not the gguf file (wasnt able to clear images with 5 steps as the schnell and iteration speed is higher than flux schnell should take).

Today I tried the flux model from Comfy-Org github page (https://comfyanonymous.github.io/ComfyUI_examples/flux/#simple-to-use-fp8-checkpoint-version) linked into /checkpoints folder and it is running now. Probably issue was about I use the wrong flux model. I haven't tried the gguf model yet.

waynemadsen commented 2 weeks ago

I'd be interested in seeing if the 4_k_s gguf dev works. I've been using it in comfyui without krita to get my base images and then move everything over to krita afterwards to touch up and refine. Have you noticed any increase in speed by running unet in krita as opposed to vanilla comfy?

Danamir commented 2 weeks ago

I'd be interested in seeing if the 4_k_s gguf dev works

Yes it's working. It's the one I've been using (and the NF4 version) because it's the only lightweight enough for me to load LoRAs.

On my system GGUF Q4_K_S a tad slower than the NF4 version, but faster than the Q6, Q8, fp8 & fp16 versions.

The fidelity is pretty good, but if I could load it with LoRAs I would rather use the Q6 version.

Danamir commented 2 weeks ago

Note : The LoRA loading is now fixed for all GGUF models.

sfisher commented 1 week ago

Thanks for this. I got it working for generating new images.

~Is it expected that inpainting doesn't work? (I get !!! Exception during processing !!! Given groups=1, weight of size [320, 5, 3, 3], expected input[1, 17, 130, 130] to have 5 channels, but got 17 channels instead)~

Oh, nevermind, I see from another ticket that in order to use Flux for inpainting you have to uncheck "seamless."

A few tips for others in case they're helpful.

In windows you can't use the mklink command from PowerShell (or at least I didn't know how to and didn't want to look it up), you need to be in a command prompt with escalated privileges instead.
If you're afraid you'll destroy the original file contents then make a copy under a different name so you can easily revert (or you could just get the file from github, but that's more work).
Don't be "clever" like me and put the import os at the top with the other imports. Must be some scoping thing (?). When I did that I couldn't get ai diffusion to load at all. Use the code like it's written and just replace the values in the variables toward the top of the function for the clip models and things.
It probably make sense to load an example workflow that uses Flux in comfy and be sure it works correctly before going through this. I got my example from https://openart.ai/workflows/onion/flux-gguf-q8-12gb/X5HzyhrKjW2jqHVCTnvT (there is a download button for the workflow). Then mess around with it until you have all the dependencies (use manager to install the missing things), the files you need, in all the right places (mine was confusing since I'm sharing models across forge and comfy), and generate at least 1 image to be sure it works before trying to install it in krita ai.

derdelush commented 1 week ago

Oh, nevermind, I see from another ticket that in order to use Flux for inpainting you have to uncheck "seamless."

I inpainted by making sure my selections were square, I'll try this when I have a chance thanks

You can use mklink if you run CMD or Powershell as an administrator. import os worked for me I put it just before the from imports at the top.

Acly / krita-ai-diffusion

Quick and dirty hack to test Flux UNET models loading (GGUF & NF4 supported, LoRA too) #1104

Thank you all again.