FizzleDorf / AIT

This was orginally written by: https://github.com/hlky
Apache License 2.0
49 stars 12 forks source link

ControlNets unload every step (very slow performance) #40

Open asagi4 opened 1 year ago

asagi4 commented 1 year ago

The extension seems to do something that breaks controlnet usage even when no AITemplate loaders are in use.

The attached image contains a workflow (modified straight from a ComfyUI controlnet example, so it should be pretty minimal) that works when AIT is not in the custom_nodes folder, and breaks when it is. AITemplate_break

The error it fails with is:

Error occurred when executing KSampler:

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

  File "/home/sd/git/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/home/sd/git/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/home/sd/git/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/home/sd/git/ComfyUI/nodes.py", line 1207, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/home/sd/git/ComfyUI/custom_nodes/AIT/AITemplate/AITemplate.py", line 175, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/home/sd/git/ComfyUI/custom_nodes/AIT/AITemplate/AITemplate.py", line 308, in sample
    samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/sd/git/ComfyUI/comfy/samplers.py", line 727, in sample
    samples = getattr(k_diffusion_sampling, "sample_{}".format(self.sampler))(self.model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sd/git/ComfyUI/comfy/k_diffusion/sampling.py", line 539, in sample_dpmpp_sde
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sd/git/ComfyUI/comfy/samplers.py", line 317, in forward
    out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, cond_concat=cond_concat, model_options=model_options, seed=seed)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sd/git/ComfyUI/comfy/k_diffusion/external.py", line 125, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/home/sd/git/ComfyUI/comfy/k_diffusion/external.py", line 151, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/home/sd/git/ComfyUI/comfy/samplers.py", line 305, in apply_model
    out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, cond_concat, model_options=model_options, seed=seed)
  File "/home/sd/git/ComfyUI/comfy/samplers.py", line 283, in sampling_function
    cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, cond_concat, model_options)
  File "/home/sd/git/ComfyUI/comfy/samplers.py", line 235, in calc_cond_uncond_batch
    c['control'] = control.get_control(input_x, timestep_, c, len(cond_or_uncond))
  File "/home/sd/git/ComfyUI/comfy/controlnet.py", line 161, in get_control
    control = self.control_model(x=x_noisy.to(self.control_model.dtype), hint=self.cond_hint, timesteps=t, context=context.to(self.control_model.dtype), y=y)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sd/git/ComfyUI/comfy/cldm/cldm.py", line 283, in forward
    emb = self.time_embed(t_emb)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/sd/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sd/git/ComfyUI/comfy/ops.py", line 18, in forward
    return torch.nn.functional.linear(input, self.weight, self.bias)
Queue size: 0
Extra options

AITemplate must move something to the wrong device at some point, but I haven't been able to figure out where it happens.

FizzleDorf commented 1 year ago

Ah just seeing it break it everywhere now. I'll merge #41 so this won't be an issue. Just a little worried it feels unfinished. The TODO here is getting the controlnets to not unload every step

FizzleDorf commented 1 year ago

I renamed this issue to be in line with the current bug.

asagi4 commented 1 year ago

@FizzleDorf I have this patch locally to just completely skip unloading and things work at sane speeds:

diff --git a/AITemplate/AITemplate.py b/AITemplate/AITemplate.py
index 74f4664..b2a76b4 100644
--- a/AITemplate/AITemplate.py
+++ b/AITemplate/AITemplate.py
@@ -411,7 +411,6 @@ class ControlNet(ControlBase):
                 context = torch.cat(cond['c_crossattn'], 1)
                 y = cond.get('c_adm', None)
                 control = self.control_model(x=x_noisy, hint=self.cond_hint, timesteps=t, context=context, y=y)
-                comfy.model_management.unload_model_clones(self.control_model_wrapped)
         else:
             # AITemplate inference, returns the same as regular
             control = self.aitemplate_controlnet(x_noisy, t, cond, self.cond_hint)

But I'm pretty sure that's just a memory leak that papers over the performance problem, so I haven't made a PR for it.

TheMindExpansionNetwork commented 1 year ago

Same issue let me know you all find a work around

TheMindExpansionNetwork commented 1 year ago

@FizzleDorf I have this patch locally to just completely skip unloading and things work at sane speeds:

diff --git a/AITemplate/AITemplate.py b/AITemplate/AITemplate.py
index 74f4664..b2a76b4 100644
--- a/AITemplate/AITemplate.py
+++ b/AITemplate/AITemplate.py
@@ -411,7 +411,6 @@ class ControlNet(ControlBase):
                 context = torch.cat(cond['c_crossattn'], 1)
                 y = cond.get('c_adm', None)
                 control = self.control_model(x=x_noisy, hint=self.cond_hint, timesteps=t, context=context, y=y)
-                comfy.model_management.unload_model_clones(self.control_model_wrapped)
         else:
             # AITemplate inference, returns the same as regular
             control = self.aitemplate_controlnet(x_noisy, t, cond, self.cond_hint)

But I'm pretty sure that's just a memory leak that papers over the performance problem, so I haven't made a PR for it.

prob dumb question How do I patch it for my comfy

just want to start using this flow but the only issue

Error occurred when executing KSampler:

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1211, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\custom_nodes\AIT\AITemplate\AITemplate.py", line 175, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\hacky.py", line 22, in informative_sample raise e File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\hacky.py", line 9, in informative_sample return original_sample(*args, kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\custom_nodes\AIT\AITemplate\AITemplate.py", line 308, in sample samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 676, in sample samples = uni_pc.sample_unipc(self.model_wrap, noise, latent_image, sigmas, sampling_function=sampling_function, max_denoise=max_denoise, extra_args=extra_args, noise_mask=denoise_mask, callback=callback, variant='bh2', disable=disable_pbar) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 880, in sample_unipc x = uni_pc.sample(img, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 730, in sample model_prev_list = [self.model_fn(x, vec_t)] File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 421, in model_fn return self.data_prediction_fn(x, t) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 403, in data_prediction_fn noise = self.noise_prediction_fn(x, t) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 397, in noise_prediction_fn return self.model(x, t) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 329, in model_fn return noise_pred_fn(x, t_continuous) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\extra_samplers\uni_pc.py", line 297, in noise_pred_fn output = model(x, t_input, model_kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\external.py", line 98, in predict_eps_discrete_timestep return (input - self(input, sigma, *kwargs)) / utils.append_dims(sigma, input.ndim) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\external.py", line 125, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\external.py", line 151, in get_eps return self.inner_model.apply_model(*args, *kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 311, in apply_model out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, cond_concat, model_options=model_options, seed=seed) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 289, in sampling_function cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, cond_concat, model_options) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 241, in calc_cond_uncond_batch c['control'] = control.get_control(inputx, timestep, c, len(cond_or_uncond)) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\controlnet.py", line 162, in get_control control = self.control_model(x=x_noisy.to(self.control_model.dtype), hint=self.cond_hint, timesteps=t, context=context.to(self.control_model.dtype), y=y) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\cldm\cldm.py", line 283, in forward emb = self.time_embed(t_emb) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "K:\AI ART UNIVERSE\COMFY-UI\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 18, in forward return torch.nn.functional.linear(input, self.weight, self.bias)

asagi4 commented 1 year ago

@TheMindExpansionNetwork just updating AIT should get rid of that exception. My patch just deals with the slowness.

If you want to use it, you should be able to copy the patch as-is to a text file and use "git apply" in the AIT repository to apply it, but you should know that doing so means you'll have to deal with your local version of AIT being different from what's on github, and the patch may interfere with updating the nodes if a conflicting change is introduced.