comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
GNU General Public License v3.0
41.38k stars 4.4k forks source link

KSampler OOM #1728

Open NotAshura opened 8 months ago

NotAshura commented 8 months ago

Error occurred when executing KSampler:

Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 6.50 GiB Requested : 40.00 MiB Device limit : 8.00 GiB Free (according to CUDA): 0 bytes PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB

File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\execution.py", line 153, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\execution.py", line 83, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\execution.py", line 76, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1236, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1206, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\hacky.py", line 22, in informative_sample raise e File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\hacky.py", line 9, in informative_sample return original_sample(args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 97, in sample samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 785, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler(), sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 690, in sample samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 630, in sample samples = getattr(k_diffusionsampling, "sample{}".format(sampler_name))(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, extra_options) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\sampling.py", line 707, in sample_dpmpp_sde_gpu return sample_dpmpp_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, r=r) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\sampling.py", line 559, in sample_dpmpp_sde denoised_2 = model(x_2, sigma_fn(s) * s_in, *extra_args) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 323, in forward out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, cond_concat=cond_concat, model_options=model_options, seed=seed) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\external.py", line 125, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), *kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\external.py", line 151, in get_eps return self.inner_model.apply_model(args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 311, in apply_model out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, cond_concat, model_options=model_options, seed=seed) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 289, in sampling_function cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, cond_concat, model_options) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 265, in calc_cond_uncond_batch output = model_function(inputx, timestep, c).chunk(batch_chunks) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\model_base.py", line 63, in apply_model return self.diffusion_model(xc, t, context=context, y=c_adm, control=control, transformer_options=transformer_options).float() File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 659, in forward h = forward_timestep_embed(module, h, emb, context, transformer_options, output_shape) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 56, in forward_timestep_embed x = layer(x, context, transformer_options) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 529, in forward x = block(x, context=context[i], transformer_options=transformer_options) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 359, in forward return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\util.py", line 123, in checkpoint return func(inputs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 469, in _forward x = self.ff(self.norm3(x)) + x File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 82, in forward return self.net(x) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "C:\AI TRW\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, *kwargs) File "C:\AI TRW\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 62, in forward return x F.gelu(gate)

NotAshura commented 8 months ago

I keep running into this Error but havent found a fix yet. So i was hoping with this ticket i could get some answers or clues to how i can fix the issue.

ltdrdata commented 8 months ago

I keep running into this Error but havent found a fix yet. So i was hoping with this ticket i could get some answers or clues to how i can fix the issue.

If you are using SAMLoader, try switching to 'Prefer CPU.'

NotAshura commented 8 months ago

Thank you for the fast answer but no im not using a SAMloader in that workflow. If you have any other ideas please let me know. (i tried it with a SAMloader but it takes ages that way)

narukaze132 commented 8 months ago

You could try using PyTorch's native memory management instead of CUDA's; perhaps CUDA's API isn't letting you access the full VRAM of the GPU for some reason. To do that, add --disable-cuda-malloc to the command used to execute main.py.

If that doesn't work, you might just not have enough VRAM for whatever you're doing. As a workaround, you could try using ComfyUI_TiledKSampler, which samples the image in tiles and uses less VRAM as a result, though it does take more time. It's also fairly bad about keeping the image consistent, since it's applying the prompt to every tile individually; my personal favored approach is to create a smaller image first, then upscale the resulting latent to my desired resolution and apply ControlNet Tile to the positive conditioning (using the decoded original image as the input), which makes it much better at applying only relevant parts of the prompt to each tile.

NotAshura commented 8 months ago

Error occurred when executing KSampler:

CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 6.54 GiB already allocated; 0 bytes free; 6.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After using --diable-cuda-malloc this happens as an Error Message :(

Maybe my VRAM is not enough. Is there any way to outsource it ?

NeedsMoar commented 8 months ago

Did anyone else happen to notice the more important part there... Torch's limit for the GPU is 16 EXABYTES?
17179869184.00 GiB

I can only guess there's a signed / unsigned conversion issue happening somewhere. Without the caching allocator enabled the call to set the max percentage to use doesn't do anything so if you OOM you'll still hit the out of cuda memory part but not the silly error. 6.66GB of vram is probably what's left after the amount taken for your monitor. This vaguely reminds me of when I hacked up a tiered ReFS storage space on a Windows 10 version that didn't really have support for them and ended up with a drive that thought it had several petabytes available.

Try using --lowvram or one of the other modes that offloads from the GPU more often if you're eating that much up (which isn't hard with SDXL). The other part looks like a bug since CUDA has decent memory use reporting and comfy should have unloaded a model to make room for this. Even though it looks like there's enough left to allocate 20MB at that point in time some nodes don't pass data around as cleanly as they could and multiple duplicates of the same latents get made on the card and there's always allocation overhead.

justanothernguyen commented 8 months ago

Something definitely broke, workflows that worked well before now just go OOM, and the requested VRAM didn't even pass the maximum amount available, not even close.

I'm trying to revert to an earlier commit that worked, but apparently comfy has recently introduced breaking changes. Some custom nodes I'm using also require reverting too.

It would be nice if the workflow contained the commit hash of the code that it was generated on (ideally for all theh custom nodes that was used too), but alas it doesn't.

brianjz commented 8 months ago

Similar issues here. Workflows that worked fine a week or so ago are now getting OOM with the most recent updates.

comfyanonymous commented 8 months ago

Move your custom_nodes folder somewhere else temporarily and try to reproduce the issue, if it can't be reproduced on the base install it's one of the custom nodes that's leaking memory.

dwgeneral commented 2 months ago

Move your custom_nodes folder somewhere else temporarily and try to reproduce the issue, if it can't be reproduced on the base install it's one of the custom nodes that's leaking memory.

Yes, when I remove the ComfyUI-MuseTalk folder from custom_nodes, the OOM issue was gone, further more, how do I fix this OOM issue on this custom_node?

edwardsdigital commented 2 months ago

Fixing the musetalk node is probably a question to ask on the musetalk repo since it’s not part of the comfyui base package.