city96 / ComfyUI-GGUF

GGUF Quantization support for native ComfyUI models
Apache License 2.0
966 stars 60 forks source link

Updating comfy dependencies causes double VRAM usage on Q8? #97

Open tiko13 opened 2 months ago

tiko13 commented 2 months ago

I had been using comfy quite happily with some pretty old packages, however today I updated the dependencies and the Q8 quant is suddenly taking the same space as fp16, and I cannot figure out why could that be, maybe the bf16 has something to do with it?:

Warning torch.load doesn't support weights_only on this pytorch version, loading unsafely.
C:\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py:79: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:212.)
  torch_tensor = torch.from_numpy(tensor.data) # mmap

ggml_sd_loader:
 1                             476
 8                             304
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
tiko13 commented 2 months ago

Okay so this seems to be caused by --highvram parameter, on normal vram the usage never surpasses 15GB, so it is probably something related to high vram loading

city96 commented 2 months ago

I haven't tested with highvram so that could definitely be it lol. I also saw people reporting issues with pytorch 2.4.x on regular comfy with flux, so that could be adding to it?