Open blepping opened 3 months ago
It works with SDXL models, but when I add LoRA, I get the following error:
TypeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU: [(torch.Size([1, 204800]), device(type='cuda', index=0)), (torch.Size([6400]), device(type='cpu')), (torch.Size([320, 1280]), device(type='cuda', index=0))]
Can you fix this?
and sometimes, with lora too:
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) ^^^^^^^^^^^^^ File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "V:\comfyu_py311\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-fork__init.py", line 171, in forward return functional_linear_4bits(x, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "V:\comfyu_py311\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-fork\init__.py", line 12, in functional_linear_4bits out = bnb.matmul_4bit(x, weight.t(), bias=bias, quant_state=weight.quant_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "V:\comfyu_py311\python_embeded\Lib\site-packages\bitsandbytes\autograd_functions.py", line 566, in matmul_4bit assert quant_state is not None ^^^^^^^^^^^^^^^^^^^^^^^ AssertionError
@bananasss00
It works with SDXL models, but when I add LoRA, I get the following error:
even if you got past that point it still wouldn't work. i actually spent some time trying to figure out how to get LoRAs working but wasn't successful. The Comfy LoRA loader logic is pretty complicated, it ends up trying to add the non-quantized LoRAs to the quantized model or something like that.
however you can use the CLIP part of LoRAs without an issue if you also load the model normally for the prompt encoding part. i also don't recommend using the CLIP
output from the BNB nodes: there's really no reason to use quantized CLIP (at least with SDXL and SD1x). in other words: use a normal model loader and it's CLIP output to encode the prompt to CONDITIONING
- you can apply whatever LoRAs you want, but only the CLIP part will have an effect. then you can use the MODEL
output from the BNB loader node for sampling. hope that makes sense.
@blepping I'm encountering an error when running the comfyui with fp8 parameters.
I fixed it by https://gist.github.com/bananasss00/bdfc67da5e1642e4e94796574d5826a0#file-__init__-py-L77
but the model loads slowly the first time. Is there a better fix for fp8 mode?
I'm encountering an error when running the comfyui with fp8 parameters.
i don't think doing that makes any sense. you need to pick one type of quantization.
likely what you're doing is loading the model, then casting it down to 8 bit (which is a lossy process), then converting it back up to 16 bit then quantizing it down to 4 bit. it's converting the model multiple times which naturally is going to be slow. the information that got thrown away converting it to 8 bit cannot be recovered, even if you cast it back to 16 bit later on.
it's basically like if you had a PNG file, then you JPEG compressed it with medium compression, then you converted it back to PNG then you JPEG compressed it with high compression - it's going to be both slower and worse quality than if just took the original high quality source and used JPEG high compression on it.
@blepping I have already compressed all my SD15 and SDXL models to FP8 for space efficiency, so the quality will not degrade further in my case.
ComfyUI is running continuously in FP8 mode. Therefore, when there is a need to load Flux or another model in NF4/FP4, there should be an option to simply add a node without having to restart ComfyUI.
I have already compressed all my SD15 and SDXL models to FP8 for space efficiency, so the quality will not degrade further in my case.
you haven't avoided the issue i mentioned, you just did the first step of it in the past. by converting a FP8 model to NF4 or whatever, you are getting hit with quality loss multiple times instead of just once.
ComfyUI is running continuously in FP8 mode.
like i said, i don't recommend doing that if you're going to be using the BNB quantization too.
Therefore, when there is a need to load Flux or another model in NF4/FP4, there should be an option to simply add a node without having to restart ComfyUI.
that would be ideal, but sadly it's beyond my capabilities. i don't even know how this stuff interacts with Comfy's internal casting/quantization if you have FP8 mode enabled. you could try creating an issue here on maybe in the main Comfy repo asking for that feature, not sure how good your chances are of it actually being implemented are.
https://gist.github.com/bananasss00/301d666f20370412a571570ec304527c
changes, mb you want add too:
changes, mb you want add too
looks useful, though i'd probably take a slightly different approach implementing it. i don't think there's much chance of this pull getting merged though, so probably the only way to make it available would to be to publish it as a different node.
@bananasss00 any idea how to solve this? I get it when i try to load a lora: File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\bitsandbytes\functional.py", line 432, in is_on_gpu raise TypeError( TypeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU: [(torch.Size([28311552, 1]), device(type='cuda', index=0)), (torch.Size([1, 3072]), device(type='cuda', index=0)), (torch.Size([1, 18432]), device(type='cuda', index=0)), (torch.Size([884736]), device(type='cpu')), (torch.Size([16]), device(type='cpu'))]
any idea how to solve this? I get it when i try to load a lora
you can't use LoRAs with NF4. i recommend switching to GGUF if possible (which does allow using LoRAs): https://github.com/city96/ComfyUI-GGUF
@blepping Thanks for the response. I thought Forge had solve this and that the code @bananasss00 posted does that too.
Thanks for the response. I thought Forge had solve this and that the code bananasss00 posted does that too.
think bananasss' changes just added some improvements to the node (like letting you choose what parts to load), it didn't add core functionality.
there's nothing that makes NF4 and LoRAs inherently impossible so Forge may have implemented it. that won't help you here though unless someone wants to try to port the changes. not too much enthusiasm for that since it's basically been abandoned/deprecated here.
BTW, i contributed changes to ComfyUI-GGUF to support SD1x and SDXL models as well as stable-diffusion.cpp quantized. so even though the documentation says "don't use with SD1/SDXL", it's out of date. just mentioning that in case that's what was holding you back from switch.
@blepping Huge! thanks for your contributions, I will check sd1/sdxl quants, didn't know it was possible.
you can post some sd1/sdxl gguf model download links if you have one.
you can post some sd1/sdxl gguf model download links if you have one.
sorry, i don't have any. it's a pretty recent thing. you'd need to convert them yourself. the easiest way is probably with https://github.com/leejet/stable-diffusion.cpp
mostly just putting this here for example/discussion purposes.
these changes seem to work but i really don't know if the implementation is correct. this pull:
CheckpointLoaderBNB
node that lets you select BNBfp4
ornf4
quantization (thedefault
option works like before, so it will fail for un-prequantized models)i also looked at implementing BNB's
int8
but it seems like the matmul for int8 doesn't support the operations Comfy wants to use.