comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
57.83k stars 6.13k forks source link

I cant generate images after update. CUDA error: an illegal instruction was encountered #5683

Open Lalimec opened 5 days ago

Lalimec commented 5 days ago

Expected Behavior

I shouldnt be getting CUDA error.

Actual Behavior

I am not able to use comfyui after the last update, it was working fine yesterday.

Steps to Reproduce

It is not about specific wfs but here it is. Flux.json

Debug Logs

(venv) ubuntu@129-146-162-177:~/cemil-test/ComfyUI$ python main.py --listen 0.0.0.0 --port 8334 --disable-all-custom-nodes
Total VRAM 40326 MB, total RAM 221449 MB
pytorch version: 2.4.1+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA A100-SXM4-40GB : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: /home/ubuntu/cemil-test/ComfyUI/web
Adding extra search path loras /home/ubuntu/cemil-test/ai-toolkit/output
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
Skipping loading of custom nodes
Starting server

To see the GUI go to: http://0.0.0.0:8334
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Requested to load SD1ClipModel
Loading 1 new model
loaded completely 0.0 235.84423828125 True
Requested to load BaseModel
Loading 1 new model
loaded completely 0.0 1639.406135559082 True
 20%|█████████████████████████████████                                                                                                                                    | 4/20 [00:00<00:02,  5.86it/s]terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7bdf43f0ef86 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7bdf43ebdd10 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7bdf43fe9f08 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x587d0 (0x7bdf43fef7d0 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x5a4f4 (0x7bdf43ff14f4 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #5: <unknown function> + 0x5db920 (0x7bdf41bdb920 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6abdf (0x7bdf43ef2bdf in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7bdf43eebc3b in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7bdf43eebde9 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x11543d7 (0x7bdf2a5543d7 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x2e0927b (0x7bdf2c20927b in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) + 0xf5 (0x7bdf2b911f15 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x2c47d23 (0x7bdf2c047d23 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::_to_copy::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) + 0x1eb (0x7bdf2b9a197b in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #14: at::native::to(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>) + 0x11d (0x7bdf2b3ca2dd in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x300d631 (0x7bdf2c40d631 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #16: at::_ops::to_dtype_layout::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>) + 0x114 (0x7bdf2bad7e24 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0x2c47e5e (0x7bdf2c047e5e in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>) + 0x200 (0x7bdf2bb50d10 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #19: <unknown function> + 0x57023b (0x7bdf41b7023b in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #20: <unknown function> + 0x5c4057 (0x7bdf41bc4057 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>

Aborted

(venv) ubuntu@129-146-162-177:~/cemil-test/ComfyUI$ python main.py --listen 0.0.0.0 --port 8334 --disable-all-custom-nodes
Total VRAM 40326 MB, total RAM 221449 MB
pytorch version: 2.4.1+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA A100-SXM4-40GB : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: /home/ubuntu/cemil-test/ComfyUI/web
Adding extra search path loras /home/ubuntu/cemil-test/ai-toolkit/output
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
Skipping loading of custom nodes
Starting server

To see the GUI go to: http://0.0.0.0:8334
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
clip missing: ['text_projection.weight']
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 9319.23095703125 True
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux
Loading 1 new model
loaded completely 0.0 22700.097778320312 True
  0%|                                                                                                                                                                              | 0/4 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7084090eef86 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x70840909dd10 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7084091c9f08 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x587d0 (0x7084091cf7d0 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x5a4f4 (0x7084091d14f4 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #5: <unknown function> + 0x5db920 (0x708406ddb920 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6abdf (0x7084090d2bdf in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7084090cbc3b in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7084090cbde9 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x891b68 (0x708407091b68 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x708407091eb6 in /home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>

Aborted

(venv) ubuntu@129-146-162-177:~/cemil-test/ComfyUI$ python main.py --listen 0.0.0.0 --port 8334 --disable-all-custom-nodes --disable-cuda-malloc
Total VRAM 40326 MB, total RAM 221449 MB
pytorch version: 2.4.1+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA A100-SXM4-40GB : native
Using pytorch cross attention
[Prompt Server] web root: /home/ubuntu/cemil-test/ComfyUI/web
Adding extra search path loras /home/ubuntu/cemil-test/ai-toolkit/output
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
Skipping loading of custom nodes
Starting server

To see the GUI go to: http://0.0.0.0:8334
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
clip missing: ['text_projection.weight']
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 9319.23095703125 True
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux
Loading 1 new model
loaded completely 0.0 22700.097778320312 True
  0%|                                                                                                                                                                              | 0/4 [00:00<?, ?it/s]
!!! Exception during processing !!! CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/home/ubuntu/cemil-test/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/home/ubuntu/cemil-test/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/home/ubuntu/cemil-test/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/home/ubuntu/cemil-test/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/home/ubuntu/cemil-test/ComfyUI/nodes.py", line 1454, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/home/ubuntu/cemil-test/ComfyUI/nodes.py", line 1421, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/sample.py", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 855, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 753, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 740, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 719, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 624, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/k_diffusion/sampling.py", line 155, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 706, in __call__
    return self.predict_noise(*args, **kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 709, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/model_base.py", line 144, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/ldm/flux/model.py", line 181, in forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/ldm/flux/model.py", line 131, in forward_orig
    img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/ldm/flux/layers.py", line 176, in forward
    txt += txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Prompt executed in 23.30 seconds
Exception in thread Thread-1 (prompt_worker):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/cemil-test/ComfyUI/main.py", line 159, in prompt_worker
    comfy.model_management.soft_empty_cache()
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/model_management.py", line 1093, in soft_empty_cache
    torch.cuda.empty_cache()
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 170, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Error handling request
Traceback (most recent call last):
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 462, in _handle_request
    resp = await request_handler(request)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/aiohttp/web_app.py", line 537, in _handle
    resp = await handler(request)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/aiohttp/web_middlewares.py", line 114, in impl
    return await handler(request)
  File "/home/ubuntu/cemil-test/ComfyUI/server.py", line 63, in cache_control
    response: web.Response = await handler(request)
  File "/home/ubuntu/cemil-test/ComfyUI/server.py", line 141, in origin_only_middleware
    response = await handler(request)
  File "/home/ubuntu/cemil-test/ComfyUI/server.py", line 496, in system_stats
    vram_total, torch_vram_total = comfy.model_management.get_total_memory(device, torch_total_too=True)
  File "/home/ubuntu/cemil-test/ComfyUI/comfy/model_management.py", line 134, in get_total_memory
    _, mem_total_cuda = torch.cuda.mem_get_info(dev)
  File "/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 685, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Other

No response

ltdrdata commented 5 days ago

First, try updating PyTorch to the latest version.

Lalimec commented 5 days ago

I actually did after reporting, its 2.5.something now, what changed is instead of error i get an infinite inference time lol. It just waits after the loading part is done. The generation below was stuck at vae decoding. Another flux inference wasnt even able to pass the model loading and got stuck at clip text encode node.

.
.
.
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
/home/ubuntu/cemil-test/ComfyUI/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
clip missing: ['text_projection.weight']
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 9319.23095703125 True
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux
Loading 1 new model
loaded completely 0.0 22700.097778320312 True
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.00it/s]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 159.87335777282715 True