MinusZoneAI / ComfyUI-Flux1Quantize-MZ

flux1非官方的量化模型(flux1 unofficial quantize model)
12 stars 2 forks source link

Please support Lora please #4

Open wardensc2 opened 1 month ago

wardensc2 commented 1 month ago

Hi @minuszoneAI

Can you add some node support loading lora when using Marlin model

Thank you in advance

wailovet commented 1 month ago

Sorry, the GGUF version of the model is good enough, I no longer have the motivation to continue developing the Marlin model

wardensc2 commented 1 month ago

Sorry, the GGUF version of the model is good enough, I no longer have the motivation to continue developing the Marlin model

But the VRAM requirement is still very high with GGUF compare with new marlin format.

Ph0rk0z commented 1 month ago

I was going to look into how to add lora, it's probably not impossible. I'd rather figure out some other kernels besides marlin (ampere only, not easily ported). We got one model only uploaded. GGUF quality is great but speed is not like this. Your node actually beat using the full model.

wailovet commented 1 month ago

I was going to look into how to add lora, it's probably not impossible. I'd rather figure out some other kernels besides marlin (ampere only, not easily ported). We got one model only uploaded. GGUF quality is great but speed is not like this. Your node actually beat using the full model.

It should be possible to implement LoRA, especially with the implementation of GGUF as a reference.

Regarding this node, since the NF4 and GGUF models had just been released by the community when the development was completed. At that time, I felt that I had done useless work, so I interrupted all the development processes. At that time, a lot of code was temporarily embedded by me in ComfyUI and had not been sorted out. So it may take some time to resume and continue development. Moreover, in the case of the existence of GGUF and NF4 versions, is this node still valuable?

Ph0rk0z commented 1 month ago

NF4 has bad quality. GGUF has good quality but it's slow. The value in this is leveraging kernels inside comfy for faster inference. It think you are first to do that.