Error and full system crash running Flux-GGUF

falcon204 commented 2 months ago

Having this issue since update, everything was working perfectly yesterday. Now soon as I hit Queue and it starts to run. I get either this error each time or my system completely crashes and restarts.

`[2024-08-21 16:53] Requested to load FluxClipModel_ [2024-08-21 16:53] Loading 1 new model [2024-08-21 16:54] loaded completely 0.0 5180.35888671875 True [2024-08-21 16:54] !!! Exception during processing !!! Allocation on device [2024-08-21 16:54] Traceback (most recent call last): File "G:\ComfyUI_windows_portable\ComfyUI\execution.py", line 316, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\execution.py", line 191, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\execution.py", line 168, in _map_node_over_list process_inputs(input_dict, i) File "G:\ComfyUI_windows_portable\ComfyUI\execution.py", line 157, in process_inputs results.append(getattr(obj, func)(inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 65, in encode output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 126, in encode_from_tokens o = self.cond_stage_model.encode_token_weights(tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\flux.py", line 57, in encode_token_weights t5_out, t5_pooled = self.t5xxl.encode_token_weights(token_weight_pairs_t5) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights o = self.encode(to_encode) ^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 229, in encode return self(tokens) ^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 201, in forward outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\t5.py", line 241, in forward return self.encoder(x, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\t5.py", line 213, in forward x, past_bias = l(x, mask, past_bias, optimized_attention) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\t5.py", line 190, in forward x = self.layer-1 ^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\t5.py", line 66, in forward forwarded_states = self.DenseReluDense(forwarded_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\t5.py", line 46, in forward hidden_gelu = self.act(self.wi_0(x)) ^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 146, in forward weight, bias = self.get_weights(x.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 125, in get_weights weight = self.get_weight(self.weight, dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 117, in get_weight weight = dequantize_tensor(tensor, dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\dequant.py", line 19, in dequantize_tensor return out.to(dtype) if out.dtype != dtype else out # why is .to() not a no-op? ^^^^^^^^^^^^^ torch.OutOfMemoryError: Allocation on device

[2024-08-21 16:54] Got an OOM, unloading all loaded models. [2024-08-21 16:54] Prompt executed in 87.08 seconds `

falcon204 commented 2 months ago

Just a little update of what I've done so far. I completely trashed my whole setup of ComfyUI and started from scratch just in case something that I had installed with another node was causing issues. Still getting the same full crash of my computer. Was working perfectly before 8-20-24. now I can't run Flux-GGUF even on a clean install with only GGUF installed and manager as the only custom nodes installed. Thought it might be my system, so I did a fully graphics card benchmark and also a memory benchmark, all passed no issues. Also did a CPU on as well no issues. So I'm completely stumped cause I can run any other models perfectly. And like I said I was running Flux-GGUF for a while making images till, I updated then crash after crash. 🤷‍♂️

falcon204 commented 2 months ago

Here's the log of the current event. Can someone help figure out what is all of the sudden wrong? Right after the last part of the log is when my computer crashes and reboots.

[2024-08-22 05:47] Starting server

[2024-08-22 05:47] To see the GUI go to: http://127.0.0.1:8188 [2024-08-22 05:47] FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE] [2024-08-22 05:48] got prompt [2024-08-22 05:48] Using pytorch attention in VAE [2024-08-22 05:48] Using pytorch attention in VAE [2024-08-22 05:48] C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py:35: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:212.) torch.from_numpy(tensor.data), # mmap [2024-08-22 05:48] ggml_sd_loader: [2024-08-22 05:48] 1 472 [2024-08-22 05:48] 8 304 [2024-08-22 05:48] [2024-08-22 05:48] [2024-08-22 05:48] model weight dtype torch.bfloat16, manual cast: torch.float16 [2024-08-22 05:48] model_type FLOW [2024-08-22 05:48] ggml_sd_loader: [2024-08-22 05:48] 8 169 [2024-08-22 05:48] 0 50 [2024-08-22 05:48] [2024-08-22 05:48] [2024-08-22 05:48] C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\dequant.py:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). data = torch.tensor(tensor.data) [2024-08-22 05:48] C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( [2024-08-22 05:48] Requested to load FluxClipModel_ [2024-08-22 05:48] Loading 1 new model [2024-08-22 05:49] loaded completely 0.0 5180.35888671875 True [2024-08-22 05:49] C:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)

city96 commented 2 months ago

Looks like a recent ComfyUI update probably broke mmap and massively increased system memory usage.

falcon204 commented 2 months ago

Whelp my computer is 🔥 .. I now get a complete crash even on video games now. I can watch videos fine and put my system through benchmarks but soon as I try running ComfyUI or a game my system crashes. Dump says: General information Bug Check Code 0x00000116 Bug Check String VIDEO_TDR_ERROR Parameter 1 0xFFFF8304D91AA010 Parameter 2 0xFFFFF80660CE1120 Parameter 3 0xFFFFFFFFC000009A Parameter 4 0x0000000000000004 Crash date August 22, 2024 4:16 AM Architecture x64 Major version 15 Minor version 22621 Number of processors 6 Dump file size 131 kB Crash source Crash source dxgkrnl.sys+30B00E Process path C:\WINDOWS\system32\drivers\dxgkrnl.sys Description DirectX Graphics Kernel Version 10.0.22621.3958 Company Microsoft Corporation Size 4.52 MB Call Stack Stack address dxgkrnl.sys+30B00E Stack address nvlddmkm.sys+1621120 Stack address dxgkrnl.sys+2BCE32 Stack address nvlddmkm.sys+1621120 Stack address dxgkrnl.sys+2B53A9 Stack address dxgkrnl.sys+30A8A0

Did all the steps to fix a VIDEO_TDR_ERROR but still having the issue. So not sure if it was related to Comfy or my PSU,CPU or Graphics card went poof 🔥 🥹 I tell yah what luck I have smh

city96 commented 2 months ago

@falcon204 Try power limit your card. Either using something like MSI afterburner or just nvidia-smi -pl 150 in an admin command prompt (set it to e.g. half of whatever your card uses normally). If it's fine like that it's possible that it's a PSU issue. Could also grab hwinfo64 and look at voltages under load, though if it crashes due to transient spikes then that won't help you much.

falcon204 commented 2 months ago

@falcon204 Try power limit your card. Either using something like MSI afterburner or just nvidia-smi -pl 150 in an admin command prompt (set it to e.g. half of whatever your card uses normally). If it's fine like that it's possible that it's a PSU issue. Could also grab hwinfo64 and look at voltages under load, though if it crashes due to transient spikes then that won't help you much.

Thanks @city96 for you taking the time to help, I really appreciate it. Tried what you said and went 75 for previous 150 and still same issues. Checked using hwinfo64 and had my screen looking like NASA lol . looked at everything and don't see anything that sticks out as alarming voltages or temps. Yet when I tried to do a CPUID powerMAX for testing can't see anything cause soon as I do the GPU test it crashes. CPU passes perfectly. But Anywhooo thanks for taking time to help. Sucks I was so enjoying ComfyUI and your node. Flux is absolutely amazing. Even though I'm now out, I'll be cheering you on, on the sidelines 😃

city96 commented 2 months ago

Most likely a hardware error then, sorry to hear. I guess you can try clean the card / PCIe slot / power connectors in a last ditch effort but sounds like a replacement / RMA will be your best shot.

falcon204 commented 2 months ago

@city96 Fixed my issue, was driver related. DirectX and my Graphics card drivers. Was wondering I'm getting a User Warning message in the terminal saying the following

ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\dequant.py:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). data = torch.tensor(tensor.data)

ComfyUI runs fine with the images producing fine but I'm always seeing this warning.

city96 commented 2 months ago

You can ignore that warning, will probably change the code eventually so pytorch stops complaining.

city96 / ComfyUI-GGUF

Error and full system crash running Flux-GGUF #57