High Memory Usage When Loading Flux Model in ComfyUI

Govee-Chan commented 3 weeks ago

Feature Idea

Hello,

I am experiencing a significant memory usage issue when using the Flux model in ComfyUI. During the model loading phase, the memory consumption spikes to approximately 70GB. This seems excessively high and may not be feasible for many users.

Existing Solutions

No response

Other

No response

JorgeR81 commented 3 weeks ago

Same for me. I only have 32 GB RAM, and my system needs to use the file page, while loading, even for the FT8 version. https://github.com/comfyanonymous/ComfyUI/issues/4239

I hope they can improve this.

The Q8_0 format looks almost as good as FT16, loads faster and requires less than 32 GB RAM while loading. https://github.com/city96/ComfyUI-GGUF

The down side could be less compatibility with other features and less model finetunes, if the format does not gain popularity.

JorgeR81 commented 3 weeks ago

memory consumption spikes to approximately 70GB

So even users with 64 GB RAM need to use the page file !

By the way, is this with PP16 ? How much RAM for FP8 ?

DivineOmega commented 3 weeks ago

Reverting to commit 3e52e0364cf81764f58e5aa4f53f0b702f4d4a81 seems to have resolved the issue for me.

git checkout 3e52e0364cf81764f58e5aa4f53f0b702f4d4a81

JorgeR81 commented 3 weeks ago

Reverting to commit https://github.com/comfyanonymous/ComfyUI/commit/3e52e0364cf81764f58e5aa4f53f0b702f4d4a81 seems to have resolved the issue for me.

How much RAM do you need now ?

I needed above 32 GB, even before this commit.

DivineOmega commented 3 weeks ago

Reverting to commit 3e52e03 seems to have resolved the issue for me.

How much RAM do you need now ?

I needed above 32 GB, even before this commit.

I've not done exact measurement, but I have a 16 GB GeForce RTX 3060, and at that commit I am able to run the Flux dev FP8 with at least 1 Lora with no issues.

Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM)

JorgeR81 commented 3 weeks ago

Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM)

Yes, Comfy UI will use most of your available VRAM, while generating ( 8 GB in my case ), and the rest is offloaded to RAM. So while the KSampler is running I use about 15 GB RAM, in FP8 mode.

The problem is that when the Flux mode is loading, it uses a lot of RAM. With "only" 32 GB of RAM, if you don't get an OOM error, it's because your system uses the page file.

You can monitor RAM / VRAM usage in the Task Manager, while generating an image.

DivineOmega commented 3 weeks ago

It's also working fine for me at 83f343146ae1e8ccaf21da5b012bf59c78b97179.

DivineOmega commented 3 weeks ago

Okay. I've done some checks at different commits.

For me, the last commit which works is 14af129c5509d10504113a1520c45b0ebcf81f14.

Commits beyond this (starting at bb222ceddb232aafafa99cd4dec38b3719c29d7d) cause out of memory issues (torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory).

If I'm understanding correcting, the issue may be being caused by the changes to the memory mangement code here: comfy/model_patcher.py. However, I'm familiar with code base so I might be looking at this wrong.

JorgeR81 commented 3 weeks ago

For me, the last commit which works is https://github.com/comfyanonymous/ComfyUI/commit/14af129c5509d10504113a1520c45b0ebcf81f14

So, you have a specific memory error and you can't generate images ?

I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images.

DivineOmega commented 3 weeks ago

For me, the last commit which works is 14af129

So, you have a specific memory error and you can't generate images ?

I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images.

Yes, beyond the commit I mentioned I get a standard CUDA out of memory error every other generation when using Flux. For full transparency, I'm using ComfyUI via SwarmUI.

JorgeR81 commented 3 weeks ago

I'm on the latest commit, with Comfy UI portable, and I don't have any errors. Maybe it's a SwarmUI issue ?

Also you mention your error happens "every other generation", so that means the model was already loaded.

But I think @Govee-Chan refers to when the model is loaded on RAM for the first time ( on the first generation ).

YureP commented 3 weeks ago

I have OOM's too after the yesterday's commits. Not only with flux but, strangely, even using SD 1.5 checkpoints. I've an RTX 3060 12 GB VRAM and 80 GB system RAM, Linux. Now i'm using a 2 days ago commit and have no problem generating with the 22.2 GB Flux-DEV (FP16 etc.) plus Lora.

D-Ogi commented 3 weeks ago

Same here. I use flux only. First generation is successful. Second fails even for 512x512 images. Third is successful again and so on. RTX 4090, 64GB RAM.

Chryseus commented 3 weeks ago

Getting OOM now after a few generations using Q8 quant, worked just fine a few days ago, 64GB RAM, 4060Ti 16GB. Python 3.10.11, Windows 10, Pytorch 2.4.0 cu124, xformers 0.0.27.post2

YureP commented 3 weeks ago

Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well

comfyanonymous commented 3 weeks ago

Can you check if you still have those OOM issues on the latest commit?

dan4ik94 commented 3 weeks ago

Can you check if you still have those OOM issues on the latest commit?

I still have OOM problems every 2-3 generations. Happens mostly when I change the prompt, it becomes very slow like I'm loading the checkpoint for the first time, then OOM. (flux schnell, rtx 3060 12 gb, 64gb ram)


File "D:\ComfyUI_windows_portable\ComfyUI\comfy\float.py", line 57, in stochastic_rounding
    return manual_stochastic_round_to_float8(value, dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\comfy\float.py", line 40, in manual_stochastic_round_to_float8
    sign * (2.0 ** (exponent - EXPONENT_BIAS)) * (1.0 + mantissa),
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: Allocation on device
Got an OOM, unloading all loaded models.

comfyanonymous commented 3 weeks ago

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

Govee-Chan commented 3 weeks ago

Can you check if you still have those OOM issues on the latest commit?

the latest commit seems to solve my problem, the comfy thread occupied 20% ram at the peak(Ive got 64 intotal, so 13g seems normal), but I haven't try it on my AWS instance where I found the OOM originally. I suspect that the issue is due to my instance having too little memory(16g), but theoretically, 16G should be sufficient to run it, right?

Thx anyway, I will try --reserve-vram 0.6 on my instance and see if it works

Govee-Chan commented 3 weeks ago

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

i got no problem with the vram, I suspect there might be an issue during the transfer from memory to GPU memory while loading the model

Foul-Tarnished commented 3 weeks ago

IS it related to pytorch 2.4 ? I tried sdwebui-forge with pytorch 2.4, and it also spike to ~70GB ram usage

Chryseus commented 3 weeks ago

I'm using Pytorch 2.4, RAM usage loading FP8 spikes to 38GB, switching model after this goes up to 58GB so maybe there is something that can be done to improve model switching, the latest updates seem to have fixed the OOM issue although I find it interesting how the VRAM usage creeps up with the first few runs of the text encoder, maybe something is not getting unloaded properly or maybe this is intended behaviour.

JorgeR81 commented 3 weeks ago

switching model after this goes up to 58GB

When you do switch, is it for the Flux FP16 version ? I think the FP16 one requires more RAM while loading.

Chryseus commented 3 weeks ago

When you do switch, is it for the Flux FP16 version ? I think the FP16 one requires more RAM while loading.

I've tried switching between FP8 and the Q8 quant which are fairly similar on VRAM usage, Q8 is very slightly higher.

JorgeR81 commented 3 weeks ago

When I use Q8, I don't have RAM spikes while loading. It never goes above 32 GB.

But I never tried to use it after FP8.

SchrodingersCatwalk commented 3 weeks ago

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

OOM errors resolved at 0.7, NVIDIA GeForce RTX 3080 Laptop GPU, 16GB, Linux, normal VRAM mode

DivineOmega commented 3 weeks ago

The latest updates mostly worked fine for me, but after trying to use Flux with >= 1 Lora, I was receiving OOM errors. Setting --reserve-vram to 0.7 resolved this.

YureP commented 3 weeks ago

OK, for me: commit d1a6bd6, at 0.6 I can generate using the full flux-dev model, but get an OOM using a lora (realism lora), and the same at 0.7. At 0.8 I can generate everything. I made a little stress test, generating several times with flux, then with XL, back to flux, alternating generation with the full model and the Q8, and so on, and had no OOMs. The max VRAM load is 11.99/12 GB, and the max system RAM load is 46/80 GB.

screan commented 3 weeks ago

updated comfy and now getting OOM with Lora as well today, worked fine yesterday.

FIrst generations works fine, then OOM after.

Foul-Tarnished commented 3 weeks ago

Q6_K is not even 0.4% worse than Q8 (for perplexity of 13B LLMs) And you gain +1gb vram

ErixStrong commented 3 weeks ago

Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well

How to return to an older commit ?

Ok I found how!

dan4ik94 commented 3 weeks ago

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

I can confirm reserving a portion of vram (0.7-1.0) helps, after 20 generations with 3 loras, no more OOMs on 3060. 🌞

RedDeltas commented 2 weeks ago

I had the same issue and the --reserve-vram flag didn't work for me, I tried values 0.6-1.0 and it didn't resolve the issue. Reverting back to https://github.com/comfyanonymous/ComfyUI/commit/14af129c5509d10504113a1520c45b0ebcf81f14 did fix it for me though.

ltdrdata commented 2 weeks ago

I had the same issue and the --reserve-vram flag didn't work for me, I tried values 0.6-1.0 and it didn't resolve the issue. Reverting back to 14af129 did fix it for me though.

try --disable-smart-memory

tobias-varden commented 2 weeks ago

I also got OOM with c6812947e98eb384250575d94108d9eb747765d9 so I had to revert back to 6ab1e6fd4a2f7cc5945310f0ecfc11617aa9a2cb which fixed the issue. I am using Flux fp8 together with two LORAs.

btln commented 2 weeks ago

--disable-smart-memory fixed it for me. Thank's!

CasualDev242 commented 2 weeks ago

Same issue. I have 64gb of RAM which ought to be plenty, and as of the recent updates the RAM usage has skyrocketed to the point where ComfyUI uses up to 70-80% of my RAM and I have to shut off the app to prevent issues

JorgeR81 commented 2 weeks ago

I think the full model it's being upcasted to FP32, while loading, so this would be about 45 GB ( without the T5 encoder ).

Could it be possible to upcast the Flux model, block by block ( instead of all at once ), keeping RAM usage lower ?

CasualDev242 commented 1 week ago

Why is this marked as "feature" and not "bug"? I had to revert to an earlier commit, and can now use ComfyUI. I can't use current versions due to the absurd RAM usage.

JorgeR81 commented 1 week ago

Why is this marked as "feature" and not "bug"?

I actually opened this, as a bug, a while back, but still not fixed. https://github.com/comfyanonymous/ComfyUI/issues/4239

JorgeR81 commented 1 week ago

I had to revert to an earlier commit, and can now use ComfyUI

As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning.

@CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ?

CasualDev242 commented 1 week ago

I had to revert to an earlier commit, and can now use ComfyUI

As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning.

@CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ?

Like I mentioned, the bug is only with recent commits, and yes, it's using Flux. I did not have high RAM usage prior to these commits. It hasn't been a problem since the beginning for me since an earlier commit fixes it and it didn't use to occur. Loading the same Flux model and Loras with an earlier commits doesn't cause the absurd RAM issue (remember, I have 64gb of RAM, and ComfyUI is using 70%+ of it? How is that not an issue with the code?)

comfyanonymous commented 1 week ago

If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage.

For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB.

That's an issue with the safetensors library on windows. Linux doesn't have this issue.

JorgeR81 commented 1 week ago

If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage.

For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB.

That's an issue with the safetensors library on windows. Linux doesn't have this issue.

So this in an issue with the safetensors file type.

I'm on windows 10. This issue does not happen with the GGUF models ( e.g.: flux1-dev-Q8_0.gguf) https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main

But there is also a full quality version there: flux1-dev-F16.gguf ( 22 GB ) Do you think using this flux1-dev-F16.gguf model could fix the problem ?

EDIT: Apparently not. With flux1-dev-F16.gguf, my RAM usage still goes from 3.8 GB to above 32 GB.

I also tried a native FP8 Flux model, ( 11 GB ), but it also requires above 32 GB RAM while loading.

comfyanonymous / ComfyUI