Both Flux Schnell and Flux Dev crashes ComfyUI

aesxsc commented 2 months ago

Expected Behavior

To not crash.

Actual Behavior

ComfyUI crashes after 5-10 seconds i click queue prompt while using Flux Schnell. Such behavior does not happen with the Flux Dev model.

Steps to Reproduce

Use Flux Schnell model.

Debug Logs

Total VRAM 16376 MB, total RAM 32658 MB
pytorch version: 2.4.0+cu121
C:\Users\xscri\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
C:\Users\xscri\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
xformers version: 0.0.27.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
Using xformers cross attention
[Prompt Server] web root: C:\Users\xscri\ComfyUI\web
Successfully imported spandrel_extra_arches: support for non commercial upscale models.
C:\Users\xscri\stable-diffusion-webui\venv\lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)

Import times for custom nodes:
   0.0 seconds: C:\Users\xscri\ComfyUI\custom_nodes\websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
*crash*

Other

I'm using the venv from Stable Diffusion WebUI.

Nvidia GRD 560.81

aesxsc commented 2 months ago

Now Flux Dev doesn't work too. They both crash the ComfyUI

GabrielLanghans commented 2 months ago

They both crash for me. I'm also usign the venv from stableDiffusion webUi...

GabrielLanghans commented 2 months ago

I've created a new venv and installed everything from scratch and got the same error.

TingTingin commented 2 months ago

have you tried downgrading torch?

GabrielLanghans commented 2 months ago

What's the recomended version?

aesxsc commented 2 months ago

They both crash for me. I'm also usign the venv from stableDiffusion webUi...

We both have 4070 Ti SUPER's, maybe that's the problem? I also tried in Arch Linux. It crashes the whole DE.

rabidcopy commented 2 months ago

Possibly not enough RAM/swap/pagefile to load the model at the given precision? Usually when a process is "Killed" in Linux, it's to prevent an out of memory situation that would lock the system up. I'd suggest looking into creating or enlarging a swap file. https://wiki.archlinux.org/title/Swap#Swap_file_creation

Try 8GB and see if that's enough, then go up by 4GB until the python process isn't killed on loading.

aesxsc commented 2 months ago

It used to work just fine a few days ago. Also I have a 32GB swapfile + 32GB RAM so I don't think it is the case. On Windows I have the pagefile set to Auto, which I don't think matters again.

rabidcopy commented 2 months ago

Then I can only suggest running git reflog and go back on commits until it works again. It should be fairly easy to determine which commit issues started.

rabidcopy commented 2 months ago

Alternatively the problem may be occurring by being on a commit that came just before https://github.com/comfyanonymous/ComfyUI/commit/b334605a6631c12bbe7b3aff6d77526f47acdf42 as this commit addresses OOMs dealing with erroneous model loading.

aesxsc commented 2 months ago

I pulled the latest commit, it still happens. Also, I couldn't exactly find which commit actually broke it.

aesxsc commented 2 months ago

Portable version is broken too.

mcDandy commented 2 months ago

Both FP16 and FP8? I have only FP16 downloaded. It does not even attempt to load the checkpoint into RAM. (--lowvram)

aesxsc commented 2 months ago

I have only FP16 too. Haven't tried FP8.

aesxsc commented 2 months ago

Other Stable Diffusion models don't crash Comfy, only Flux models crash it.

mcDandy commented 2 months ago

Same problem. Disabling custom nodes does nothing so I copied output with the nodes active which contains environment info.

[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-08-09 17:42:52.116158
** Platform: Windows
** Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
** Python executable: F:\stability\Data\Packages\ComfyUI\venv\Scripts\python.exe
** ComfyUI Path: F:\stability\Data\Packages\ComfyUI
** Log path: F:\stability\Data\Packages\ComfyUI\comfyui.log

Prestartup times for custom nodes:
   4.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 12282 MB, total RAM 32468 MB
pytorch version: 2.1.2+cu121
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4080 Laptop GPU : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: F:\stability\Data\Packages\ComfyUI\web
Adding extra search path checkpoints F:\stability\Data\Models\StableDiffusion
Adding extra search path vae F:\stability\Data\Models\VAE
Adding extra search path loras F:\stability\Data\Models\Lora
Adding extra search path loras F:\stability\Data\Models\LyCORIS
Adding extra search path upscale_models F:\stability\Data\Models\ESRGAN
Adding extra search path upscale_models F:\stability\Data\Models\RealESRGAN
Adding extra search path upscale_models F:\stability\Data\Models\SwinIR
Adding extra search path embeddings F:\stability\Data\Models\TextualInversion
Adding extra search path hypernetworks F:\stability\Data\Models\Hypernetwork
Adding extra search path controlnet F:\stability\Data\Models\ControlNet
Adding extra search path controlnet F:\stability\Data\Models\T2IAdapter
Adding extra search path clip F:\stability\Data\Models\CLIP
Adding extra search path clip_vision F:\stability\Data\Models\InvokeClipVision
Adding extra search path diffusers F:\stability\Data\Models\Diffusers
Adding extra search path gligen F:\stability\Data\Models\GLIGEN
Adding extra search path vae_approx F:\stability\Data\Models\ApproxVAE
Adding extra search path ipadapter F:\stability\Data\Models\IpAdapter
Adding extra search path ipadapter F:\stability\Data\Models\InvokeIpAdapters15
Adding extra search path ipadapter F:\stability\Data\Models\InvokeIpAdaptersXl
Adding extra search path prompt_expansion F:\stability\Data\Models\PromptExpansion
[Crystools INFO] Crystools version: 1.16.6
[Crystools INFO] CPU: 13th Gen Intel(R) Core(TM) i9-13950HX - Arch: AMD64 - OS: Windows 10
[Crystools INFO] Pynvml (Nvidia) initialized.
[Crystools INFO] GPU/s:
[Crystools INFO] 0) NVIDIA GeForce RTX 4080 Laptop GPU
[Crystools INFO] NVIDIA Driver: 560.81
[inference_core_nodes.controlnet_preprocessors] | INFO -> Using ckpts path: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Inference-Core-Nodes\src\inference_core_nodes\controlnet_preprocessors\ckpts
[inference_core_nodes.controlnet_preprocessors] | INFO -> Using symlinks: False
[inference_core_nodes.controlnet_preprocessors] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
DWPose: Onnxruntime with acceleration providers detected
F:\stability\Data\Packages\ComfyUI\venv\lib\site-packages\diffusers\models\transformers\transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
### Loading: ComfyUI-Manager (V2.48.6)
### ComfyUI Revision: 2504 [55ad9d5f] | Released on '2024-08-09'
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
Use STYLE(weight_interpretation, normalization) at the start of a prompt to use advanced encodings
Weight interpretations available: comfy,perp
Normalization types available: none
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[comfyui_controlnet_aux] | INFO -> Using ckpts path: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui_controlnet_aux\ckpts
[comfyui_controlnet_aux] | INFO -> Using symlinks: False
[comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json

Import times for custom nodes:
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\sd-dynamic-thresholding
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui-inpaint-nodes
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyMath
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui-tooling-nodes
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI_ExtraModels
   0.1 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui_controlnet_aux
   0.1 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui-prompt-control
   0.4 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI_TensorRT
   0.5 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Crystools
   1.7 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Manager
   1.9 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Inference-Core-Nodes

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt

It does not continue. output form nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.81                 Driver Version: 560.81         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...  WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   44C    P8              4W /  175W |    1031MiB /  12282MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1892    C+G   C:\Windows\explorer.exe                     N/A      |
|    0   N/A  N/A      3588    C+G   ...n\126.0.2592.113\msedgewebview2.exe      N/A      |
|    0   N/A  N/A      8024    C+G   ...0.0_x64__cv1g1gvanyjgm\WhatsApp.exe      N/A      |
|    0   N/A  N/A      8332    C+G   ...ekyb3d8bbwe\PhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A     10628    C+G   ...ft Office\root\Office16\OUTLOOK.EXE      N/A      |
|    0   N/A  N/A     10908    C+G   ...al\Discord\app-1.0.9157\Discord.exe      N/A      |
|    0   N/A  N/A     10912    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|    0   N/A  N/A     11044    C+G   ....0_x64__kzh8wxbdkxb8p\DCv2\DCv2.exe      N/A      |
|    0   N/A  N/A     11528    C+G   ...5n1h2txyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A     12700    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A     12728    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A     14148    C+G   F:\stability\StabilityMatrix.exe            N/A      |
|    0   N/A  N/A     15608    C+G   ...n\126.0.2592.113\msedgewebview2.exe      N/A      |
|    0   N/A  N/A     15680    C+G   ...__8wekyb3d8bbwe\Notepad\Notepad.exe      N/A      |
|    0   N/A  N/A     16320    C+G   ...n\NVIDIA app\CEF\NVIDIA Overlay.exe      N/A      |
|    0   N/A  N/A     18692    C+G   ...ys\WinUI3Apps\PowerToys.Peek.UI.exe      N/A      |
|    0   N/A  N/A     18836    C+G   ...werToys\PowerToys.PowerLauncher.exe      N/A      |
|    0   N/A  N/A     19384    C+G   ...werToys\PowerToys.ColorPickerUI.exe      N/A      |
|    0   N/A  N/A     19800    C+G   ...__8wekyb3d8bbwe\WindowsTerminal.exe      N/A      |
|    0   N/A  N/A     22020    C+G   ...\cef\cef.win7x64\steamwebhelper.exe      N/A      |
|    0   N/A  N/A     22184    C+G   ...les\Microsoft OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A     23308    C+G   ...m Files (x86)\Overwolf\Overwolf.exe      N/A      |
|    0   N/A  N/A     23604    C+G   ...rwolf\0.256.0.2\OverwolfBrowser.exe      N/A      |
|    0   N/A  N/A     24508    C+G   ...ress\CefSharp.BrowserSubprocess.exe      N/A      |
|    0   N/A  N/A     24684    C+G   ...les\Microsoft OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A     25000    C+G   ...\cef\cef.win7x64\steamwebhelper.exe      N/A      |
|    0   N/A  N/A     26256    C+G   ...crosoft\Edge\Application\msedge.exe      N/A      |
|    0   N/A  N/A     30260    C+G   ...oogle\Chrome\Application\chrome.exe      N/A      |
+-----------------------------------------------------------------------------------------+

geroldmeisinger commented 2 months ago

duplicate of https://github.com/comfyanonymous/ComfyUI/issues/4198

aesxsc commented 2 months ago

Also rarely, not only ComfyUI but the whole GPU, Chrome, and CUDA runtime?(nvidia-smi does not work) crashes with it.

OS: Windows 11 22635

Happens with Arch too.

mcDandy commented 2 months ago

Only confyUI for me crashes. Nothing else is happening. Not even elevated RAM or VRAM usage.

mcDandy commented 2 months ago

Ran confyui through Nsight systems.

Logs: https://drive.google.com/file/d/1mEIIkvHAykUCHl_cFJt3AmMENlt_zWtQ/view?usp=sharing https://drive.google.com/file/d/10HXsy0A96zMALsPYRib59UrvSUSWkh_J/view?usp=drive_link

mcDandy commented 2 months ago

Problem will be with the drivers. Does not work on 560.81; worked in 560.70.

Edit: It is not Confyui or Nvidia drivers. Downgraded both and still does not work.

mcDandy commented 2 months ago

So it was fixed to me by moving my pagefile so it is both on C: to F: (Both on the same SSD).

sabum6800 commented 2 months ago

I might have a Solution for everyone. (at least it woked out for me).

After struggling now for weeks i tryed out an even older versions for my GPU because i have seen, that these kind of problems are mainly on people with a "RTX 4070ti Super 16Gb". I was even about to send it back cause i tryed EVERYTHING...

However - with the Version: 551.23 for this GPU (See picture) my problems got solved!!!! Screenshot 2024-08-10 172750

I really hope this Version is fixing also your problems. :)

aesxsc commented 2 months ago

I might have a Solution for everyone. (at least it woked out for me).

After struggling now for weeks i tryed out an even older versions for my GPU because i have seen, that these kind of problems are mainly on people with a "RTX 4070ti Super 16Gb". I was even about to send it back cause i tryed EVERYTHING...

However - with the Version: 551.23 for this GPU (See picture) my problems got solved!!!!

I really hope this Version is fixing also your problems. :)

I just got this card a week ago and I installed 560.70. Then after few days 560.81 got released and it still worked. After 1-2 days it completely stopped working on Flux models. I will to try to roll back to 560.70 and see if that was the problem.

aesxsc commented 2 months ago

Switched back to 560.70, doesn't work. Now I'll try 551.23 as suggested by @sabum6800 . Also why this specific version? Did you try every other version after that?

aesxsc commented 2 months ago

And... nope. Switched to 551.23, nothing really changed. Still crashes the same way.

aesxsc commented 2 months ago

It interestingly works on Arch Linux right now, latest drivers, latest commit.

cherryboio commented 2 months ago

i have a 2070 super 8gb vram and 32gb ram, latest drivers 560.81. comfy ui crashes after "got prompt", for flux dev, shnell and the FP8 versions. other models works for me. I have no idea on how to get the logs though.

i tried the portable and manual installations of comfyui, both have the same issue

bridgesense commented 2 months ago

Possibly not enough RAM/swap/pagefile to load the model at the given precision? Usually when a process is "Killed" in Linux, it's to prevent an out of memory situation that would lock the system up. I'd suggest looking into creating or enlarging a swap file. https://wiki.archlinux.org/title/Swap#Swap_file_creation

Try 8GB and see if that's enough, then go up by 4GB until the python process isn't killed on loading.

Thanks for mentioning this. I have a new Arch install myself and never allocated a swapfile. This fixed my issue that appears similar to this.

aesxsc commented 2 months ago

i have a 2070 super 8gb vram and 32gb ram, latest drivers 560.81. comfy ui crashes after "got prompt", for flux dev, shnell and the FP8 versions. other models works for me. I have no idea on how to get the logs though.

i tried the portable and manual installations of comfyui, both have the same issue

Yeah, it still doesn't work in Windows.

comfyanonymous commented 2 months ago

Try to set your page file with "system managed size".

aesxsc commented 2 months ago

It is already set to "system managed size". Also, the crash happens before even the model is loaded into RAM.

geroldmeisinger commented 2 months ago

as mentioned in

duplicate of https://github.com/comfyanonymous/ComfyUI/issues/4198

try to start without --lowvram. as to WHY this helps I still don't know yet.

aesxsc commented 2 months ago

I'm starting without --lowvram but it automatically switches to lowvram even though I use --normalvram.

geroldmeisinger commented 2 months ago

I think I found the problem and may be affected by the Intel Raptor Lake instability and degradation issue due to elevated operating voltage after all.

Before you do anything UPDATE YOUR BIOS OR YOU MAY DAMAGE YOUR CPU!

I tried the Robeytech test 10x times => passed
I tried Prime95 for 5min => passed
Today I also tried Intel Processor Diagnostics tool => HANGS on CPULoad!

Update your BIOS before you do this and make sure it includes something like Update microcode 0x129 to address sporadic Vcore elevation behavior announced by Intel.

hang

The following models are affected:

13th gen:
i9-13900KS
i9-13900K
i9-13900KF
i9-13900F
i9-13900
i7-13700K
i7-13700KF
i7-13790F
i7-13700F
i7-13700
i5-13600K
i5-13600KF

14th gen:
i9-14900KS
i9-14900K
i9-14900KF
i9-14900F
i9-14900
i7-14700K
i7-14700KF
i7-14790F
i7-14700F
i7-14700
i5-14600K
i5-14600KF

Solution

Load a low-voltage profile in UEFI (I never tried this before because I assumed the BIOS defaults are fine): If I use "e-core disabled" I am able to run Flux on --lowvram

rvinuela commented 2 months ago

I have a Ryzen CPU, a 3080ti with 12 GB RAM, system RAM 32 GB, and have the same problem. I tried with FP8, checkpoints, several workflows, upgrading COMFYUI, also pytorch to 2.4, etc. still the same issue. I noticed that it didn't try to load anything, nor uses my VRAM.

Skiddoh commented 2 months ago

maybe it gives a hint to the error, as i dont have the time to go dive deep into where the error may come from exactly, but after having the same issues i tried several things like updating/downgrading etc. adding flux related modules and so on.

To me it seems like having the "weight_dtype" parameter of node "Load Diffusion Model" should not be set to default, but e.g. fp8_e5m2 instead. after switching the dtype it started to load instead of crashing right away.

I'm also on a 4070 super ti and have the latest version installed:

maybe its trying to convert/cast dtypes and the error lies there, but i cant tell without looking into it.

rakib91221 commented 2 months ago

Other Stable Diffusion models don't crash Comfy, only Flux models crash it.

sdxl+flux same problem with rtx 3060 12gb

themightyatom commented 1 month ago

RTX 3090, same problem. Hangs at got prompt then nothing happens, no error messages. I have an older version in StabilityMatrix that runs fine, but does not support the latest enhancements for Flux

yar3333 commented 1 month ago

My case: Windows, RTX 3060, 12GB VRAM, 32GB RAM (+16GB swap), recent portable ComfyUI.

UNET version (file flux1-dev-fp8-e4m3fn.safetensors/11GB loaded via "Load Diffusion Model" node) works good. Checkpoint version (file flux1-dev-fp8.safetensors/17GB loaded via "Load Checkpoint" node) - exit w/o errors after adding prompt to queue.

I increase swap to 32GB. Now checkpoint version works!

rvinuela commented 1 month ago

My case: Windows, RTX 3060, 12GB VRAM, 32GB RAM (+16GB swap), recent portable ComfyUI.

UNET version (file flux1-dev-fp8-e4m3fn.safetensors/11GB loaded via "Load Diffusion Model" node) works good. Checkpoint version (file flux1-dev-fp8.safetensors/17GB loaded via "Load Checkpoint" node) - exit w/o errors after adding prompt to queue.

I increase swap to 32GB. Now checkpoint version works!

That worked, I increased as you suggested the swap to 32 GB, and it worked. What happens is that for some reason, before loading the model into VRAM, it actually consumed 44 GB of system RAM (which I did not have), then it loaded the model into 11 GB of VRAM and the system RAM usage went down to 12 GB. So that initial peak seems to be the problem at least for me.

Thanks for your answer.

comfyanonymous / ComfyUI