Open okolenmi opened 1 year ago
Have you tried the newer Tiled Vae Decoder? I've had great success decoding much larger images using it with 4GB of VRAM
If you are talking about tiled VAE in a111 UI... well, I haven't tried it. It's a big hole of VRAM, anyway. First 2-4 images I can generate with 1000*1000 resolution, then after 20 generations I have not enough memory to generate even a 500*500 image. I need to manually restart server to generate something again. Restart in UI doesn't help at all. It looks like memory leaks.
Hmmm... new log in --normalvram mode this time looks different (single element in queue): after trying to use custom vae
100%|██████████████████████████████████████████████████████████████████████████████████| 22/22 [07:31<00:00, 20.50s/it]
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
Global Step: 840001
!!! Exception during processing !!!
Traceback (most recent call last):
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\nodes.py", line 241, in decode
return (vae.decode(samples["samples"]), )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\comfy\sd.py", line 626, in decode
pixel_samples[x:x+batch_number] = torch.clamp((self.first_stage_model.decode(samples) + 1.0) / 2.0, min=0.0, max=1.0).cpu().float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Prompt executed in 473.44 seconds
Exception in thread Thread-1 (prompt_worker):
Traceback (most recent call last):
File "threading.py", line 1038, in _bootstrap_inner
File "threading.py", line 975, in run
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\main.py", line 88, in prompt_worker
comfy.model_management.soft_empty_cache()
File "D:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\comfy\model_management.py", line 554, in soft_empty_cache
torch.cuda.empty_cache()
File "D:\ComfyUI_windows_portable_nightly_pytorch\python_embeded\Lib\site-packages\torch\cuda\memory.py", line 164, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Sorry for the confusion, I was talking about the node "VAEDecodeTiled" ( https://github.com/comfyanonymous/ComfyUI/blob/master/nodes.py#L243 )
Thank you! This one works very nice (tested in --lowvram mode). This VAE Decoder should be as recomended for low ram profile users.
Just a friendly reminder, 4GB is already under the lowest requirements. Track your memory usage with nvidia-smi or rocm-smi. Sometimes checkpoint, Lora and other assets may use too much memory into your VRAM.
Sometimes, even I face some random problems queuing images with ESRGANx4 under 10GB RAM.
Thank you! This one works very nice (tested in --lowvram mode). This VAE Decoder should be as recomended for low ram profile users.
Could you tell me how to change the Vae decoder for the tiled version? I'm new to this, I'm trying everything and I can't do it, thank you in advance!
I have a bad 4GB GPU, but it looks like this is almost enough to generate big images using this UI (a111's UI can't even start processing of something like this.) After 100% processing of 1920*1080 image in KSampler I have error messages: [The latest (today) test version of this ui] --normalvram.
--lowvram (queue with 3 elements)
About --normalvram mode: I don't know how exactly the image is processed in VAE Decoder, but if it were possible to clear some memory after KSampler processing, it would allow everyone to generate larger images than usual.
The reasons of the failure seem to be different for normal and low memory modes. So I can't say nothing about --lowram mode.