Open wardensc2 opened 1 month ago
Sorry, the GGUF version of the model is good enough, I no longer have the motivation to continue developing the Marlin model
Sorry, the GGUF version of the model is good enough, I no longer have the motivation to continue developing the Marlin model
But the VRAM requirement is still very high with GGUF compare with new marlin format.
I was going to look into how to add lora, it's probably not impossible. I'd rather figure out some other kernels besides marlin (ampere only, not easily ported). We got one model only uploaded. GGUF quality is great but speed is not like this. Your node actually beat using the full model.
I was going to look into how to add lora, it's probably not impossible. I'd rather figure out some other kernels besides marlin (ampere only, not easily ported). We got one model only uploaded. GGUF quality is great but speed is not like this. Your node actually beat using the full model.
It should be possible to implement LoRA, especially with the implementation of GGUF as a reference.
Regarding this node, since the NF4 and GGUF models had just been released by the community when the development was completed. At that time, I felt that I had done useless work, so I interrupted all the development processes. At that time, a lot of code was temporarily embedded by me in ComfyUI and had not been sorted out. So it may take some time to resume and continue development. Moreover, in the case of the existence of GGUF and NF4 versions, is this node still valuable?
NF4 has bad quality. GGUF has good quality but it's slow. The value in this is leveraging kernels inside comfy for faster inference. It think you are first to do that.
Hi @minuszoneAI
Can you add some node support loading lora when using Marlin model
Thank you in advance