Closed GitwithDX closed 1 year ago
I also have this issue, figure it might be an issue with my install or something. But it happened on a clean install on a new PC too. Same generation settings work fine without agent scheduler.
I run into this often and randomly. My task list won't run beyond a couple of hours without stopping due to this error.
It's unlikely the error is from the extension. Maybe related to this: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/9954.
I get this same error. Rtx 3080ti laptop 16GB Vram. When error happens, VRAM is NOT maxed out. I have alienware x17 R2. I do NOT use torch, as it didn't work with laptop at time I tried it and I had to clean install A111 to fix. I use --opt-sdp-attention instead. I do not use msi afterburner, so that fix in #9954 didn't help. Any suggested fixes?
have this problem too only with shedule
Same issue occurs. I have never had this problem when not using this extension. Also, I don't use msi afterburner.
Could you please specify the types of tasks you were executing when the issue occurred? I'll attempt to recreate a similar queue to see if I can reproduce the problem.
It usually happens to me in the middle of a high batch count txt2img task (1x50~100). The issue generally appears after a few generations. I also used to queue more than one task like that at once.
It seems to be some sort of issue with clearing memory in between batched generations? idk
I'm using an RTX 3060 with 12GB VRAM
Here's an example of a task I usually do, and I'll try to reproduce the issue again to get the full error (and more reproduceable info):
Batch count:50, Batch size: 1 Steps: 28, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2087503294, Size: 512x768, Model hash: 1bab7a0895, Model: kizukiV3, VAE hash: df3c506e51, VAE: kizukiV3.vae.pt, Denoising strength: 0.45, Clip skip: 2, Hires upscale: 2.2, Hires steps: 26, Hires upscaler: 4x-UltraSharp, Lora hashes: "yamatowanpi3_64dim-5e-5: 4a5a68014e8b, shuicolor_v1: 2031bfec9abb", Version: v1.6.0-RC-12-g72ee347e
Got one, it happened around the 30th image on a 1x100 batch:
Exception in thread MemMon:███████████▋ | 1730/5600 [29:48<1:38:43, 1.53s/it]
Traceback (most recent call last):
File "C:\Users\paulo\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "E:\Programming\stable-diffusion-webui\modules\memmon.py", line 53, in run
free, total = self.cuda_mem_get_info()
File "E:\Programming\stable-diffusion-webui\modules\memmon.py", line 34, in cuda_mem_get_info
return torch.cuda.mem_get_info(index)
File "E:\Programming\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 618, in mem_get_info
return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
75%|█████████████████████████████████████████████████████████ | 21/28 [00:34<00:11, 1.65s/it]
*** Error completing request
*** Arguments: ('task(7mve3l4s0k39mka)', ' <lora:Shidare_Hotaru:0.7> HtrShdr-KJ, black skirt, hair ornament, bow, shirt, suspender skirt, hair ribbon, high-waist skirt, hair flower, hairband, light smile, __pose__ , detailed __bg1__ background, __time__ , (best quality, absurdres, highly detailed, intricate detail, masterpiece:1.2), <lora:shuicolor_v1:0.2> (realistic:0.5) ', '(loli, chibi, young, futa, trans:1.4)
(multiple views, monochrome:1.4), (jpeg artifacts:1.4), (worst quality, low quality:1.4), (sketch, patreon logo, watermark, comic:1.2), bad-hands-5, simple background, nude, nsfw, cg, realistic, 3d', [], 28, 'DPM++ 2M Karras', 100, 1, 7, 768, 512, True, 0.5, 2.2, '4x-UltraSharp', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', ['VAE: kizukiV3.vae.pt', 'Clip skip: 2', 'Model hash: kizukiV3.safetensors [1bab7a0895]'], <agent_scheduler.task_runner.FakeRequest object at 0x000002C5E60E5660>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, 0, 4, 512, 512, True, 'None', 'None', 0, False, {'ad_model': 'face_yolov8n.pt', 'ad_prompt': '', 'ad_negative_prompt': '',
'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength':
0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'Euler a', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'inpaint_global_harmonious', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'Euler a', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'inpaint_global_harmonious', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, True, False, 1, False, False, False, 1.1, 1.5, 100,
0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', True, 'keyword prompt', 'keyword1, keyword2', 'None', 'textual inversion first', 'None', '0.7', 'None', <scripts.animatediff_ui.AnimateDiffProcess object at 0x000002C5E6153C10>, {'is_cnet': True, 'enabled': False, 'module': 'none', 'model': 'None', 'weight': 1, 'image': None, 'resize_mode': 'Crop and Resize', 'low_vram': False, 'processor_res': 512, 'threshold_a':
64, 'threshold_b': 64, 'guidance_start': 0, 'guidance_end': 1, 'pixel_perfect': False, 'control_mode': 'Balanced', 'is_ui': True, 'input_mode': 'simple', 'batch_images': '', 'output_dir': '', 'loopback': False}, {'is_cnet': True, 'enabled': False, 'module': 'none', 'model': 'None', 'weight': 1, 'image': None, 'resize_mode': 'Crop and Resize', 'low_vram': False, 'processor_res': 512, 'threshold_a': 64, 'threshold_b': 64, 'guidance_start': 0, 'guidance_end': 1, 'pixel_perfect': False, 'control_mode': 'Balanced', 'is_ui': True, 'input_mode': 'simple', 'batch_images': '', 'output_dir': '', 'loopback': False}, {'is_cnet': True, 'enabled': False, 'module': 'none', 'model': 'None',
'weight': 1, 'image': None, 'resize_mode': 'Crop and Resize', 'low_vram': False, 'processor_res': 512, 'threshold_a': 64, 'threshold_b': 64, 'guidance_start': 0, 'guidance_end': 1, 'pixel_perfect': False, 'control_mode': 'Balanced', 'is_ui': True, 'input_mode': 'simple', 'batch_images': '', 'output_dir': '', 'loopback': False}, False, False, 'Matrix', 'Columns', 'Mask', 'Prompt', '1,1', '0.2', False, False, False, 'Attention', False, '0', '0', '0.4', None, '0', '0', False, False, False, 0, None, [], 0, False, [], [], False, 0, 1, False, False, 0, None, [], -2, False, [], False, 0, None, None, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50, [], 30, '', 4, [], 1, '', '', '', '') {}
Traceback (most recent call last):
File "E:\Programming\stable-diffusion-webui\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "E:\Programming\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
processed = processing.process_images(p)
File "E:\Programming\stable-diffusion-webui\modules\processing.py", line 732, in process_images
res = process_images_inner(p)
File "E:\Programming\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
File "E:\Programming\stable-diffusion-webui\modules\processing.py", line 867, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "E:\Programming\stable-diffusion-webui\modules\processing.py", line 1156, in sample
return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
File "E:\Programming\stable-diffusion-webui\modules\processing.py", line 1242, in sample_hr_pass
samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)
File "E:\Programming\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in sample_img2img
samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "E:\Programming\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
return func()
File "E:\Programming\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in <lambda>
samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "E:\Programming\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\Programming\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 605, in
sample_dpmpp_2m
x = (sigma_fn(t_next) / sigma_fn(t)) * x - (-h).expm1() * denoised_d
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
---
Exception in thread Thread-33 (execute_task):
Traceback (most recent call last):
File "C:\Users\paulo\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\paulo\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "E:\Programming\stable-diffusion-webui\extensions\sd-webui-agent-scheduler\agent_scheduler\task_runner.py", line 344, in execute_task
res = self.__execute_task(task_id, is_img2img, task_args)
File "E:\Programming\stable-diffusion-webui\extensions\sd-webui-agent-scheduler\agent_scheduler\task_runner.py", line 434, in __execute_task
return self.__execute_ui_task(task_id, is_img2img, *ui_args)
File "E:\Programming\stable-diffusion-webui\extensions\sd-webui-agent-scheduler\agent_scheduler\task_runner.py", line 468, in __execute_ui_task
shared.state.end()
File "E:\Programming\stable-diffusion-webui\modules\shared_state.py", line 128, in end
devices.torch_gc()
File "E:\Programming\stable-diffusion-webui\modules\devices.py", line 51, in torch_gc
torch.cuda.empty_cache()
File "E:\Programming\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 133, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.`
I'm checking right now to see if running many identical tasks with a 25 batch count each might help workaround the issue. If that's the case, it would be nice to have an option to easily enqueue multiple copies of the same task at once.
Last night, I queued a task with batch_count 100, 768 x 768, and hires fix 1.2 (my 2060 struggles with larger sizes), and it completed without any issues. I'll attempt a few more tasks to see if I can replicate the problem. Did reducing the batch_count to 25 provide any relief for you?
Unfortunately, it did not. It got into the same misaligned address error after a couple tasks. 😔
I saw you mentioned in another issue there might be some problem with xformers (altough I'm not sure if they're related) should I try sdp? (But I admit I'll really hate if that's the solution because it increases the inference time for me by about 25% per image.)
Will also try later with some extensions disabled (even if I'm not actively using them) to see if that might interfere somehow. Will let you know both results.
Generation stopped several times a day due to this problem, but when I updated Cuda from 11.8 to 12.2 using the instructions below, it has not occurred once in 3 days. I'll wait and see. Cuda 12.2 New Libs
Generation stopped several times a day due to this problem, but when I updated Cuda from 11.8 to 12.2 using the instructions below, it has not occurred once in 3 days. I'll wait and see. Cuda 12.2 New Libs
Updated to 12.2 following the instructions above and ran 4 batches of 100 with no issue (never got even close to that before), still testing, but I think that fixes the issue.
Since it's resolved, I'll close the issue then.
I use the agent scheduler API to submit large batches of prompts. I will que up several thousands of prompts. And let it automatic1111 run for 8, or 17+ hours straight.
My experience on CUDA 11.3, with agent scheduler,
When automatic web UI is open, I would encounter CUDA Error regularly, like at least ones an hour, or more. Ones the error happened, agent scheduler would error through all the remaining prompts.
If I closed automatic web UI, I can generate successfully for 17+ hours without issue, or more.
My experience on CUDA 12.2, with agent scheduler,
When the automatic web UI is open, I encounter locks up regularly. If I wait, it will spit out an error and continue to generate. But it will continue to get error with increasing frequency till python crashes. I've also still see CUDA errors, but not as much, probably because automatic1111 is crashing before it can get to a CUDA error.
If I closed automatic web UI, I get an CUDA error regularly. Like at least ones an hour, or more. Ones the error happened, agent scheduler will error through all the remaining prompts.
After experiencing more issues with CUDA 12.2, I've downgraded back to CUDA 11.3. And I'm again able to generate for 8+ hour without issue. This is with the Web UI closed.
The process always got interrupted, with this error "RuntimeError: CUDA error: misaligned address CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions."I have device: NVIDIA GeForce RTX 3080 Ti when im running i dont really see anything wrong here. ram: free:12.89 used:3.01 total:15.9 gpu: free:7.5 used:4.5 total:12.0 gpu-active: current:2.14 peak:3.36 gpu-allocated: current:2.14 peak:3.36 gpu-reserved: current:2.17 peak:5.08 gpu-inactive: current:0.03 peak:1.0 events: retries:0 oom:0 utilization: 0 I'm not sure what the issue is. I never have any other apps running in the background. Please let me know what I can do to stop this from happening. Thank you!