comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
57.91k stars 6.14k forks source link

AMD GPU --directml --lowvram error #939

Open lilly1987 opened 1 year ago

lilly1987 commented 1 year ago

my GPU RX 6600 i want lowvram option

run

.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --directml --lowvram 

than

!!! Exception during processing !!!
Traceback (most recent call last):
  File "C:\ComfyUI_windows_portable2\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "C:\ComfyUI_windows_portable2\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "C:\ComfyUI_windows_portable2\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "C:\ComfyUI_windows_portable2\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "C:\ComfyUI_windows_portable2\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "C:\ComfyUI_windows_portable2\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "C:\ComfyUI_windows_portable2\ComfyUI\comfy\model_management.py", line 302, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\big_modeling.py", line 373, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\hooks.py", line 527, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\hooks.py", line 527, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\hooks.py", line 497, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\hooks.py", line 253, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\accelerate\utils\modeling.py", line 165, in set_module_tensor_to_device
    new_value = old_value.to(device)
  File "C:\ComfyUI_windows_portable2\python_embeded\lib\site-packages\torch\cuda\__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

run

.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --directml --normalvram

than 8GB use vram 2023-07-19 21 14 54 2023-07-19 21 15 00

NeedsMoar commented 1 year ago

The windows standalone build isn't set up for DirectML if I recall, and when I did the manual install I had to remove all the torch variants and then install just torch-directml to get it to pick things up correctly. As for whether it'll work on that card, I don't know.

The directml pipeline isn't very optimized and can't use the lowvram setting anyway: lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enabled by default.

Since you're on a card without much vram and DirectML isn't particularly fast, unless you need something specific with the node based workflow I'd try NOD.ai SHARK for basic image generation / inpainting / outpainting stuff, it's much faster and you don't need to do anything weird to get it working on AMD.

Shark's problem is that it flattens and pre-tunes model + LoRA + VAE combos and downloads base models for everything so it eats disk space like a fat kid in a candy store if you're not keeping an eye on it. It's also the fastest thing for AMD right now, with compiled ONNX for DirectML lagging about 20% behind (and it has many of the same problems needing to pre-compile).

That said, Comfy has far more features and customizability so you might be stuck working with small images and models within the constraints of your card. Most of the UIs aren't planning on doing much about AMD until AMD gets off their butts and ports RoCM to Windows. Integrating llvm-iree into a UI that wasn't built around it is non-trivial and it's still too much a moving target to consider as a backend for smaller projects, IMO.