[Bug]: Crashed randomly with log below

kacherHuynh commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

It was crashed when generate the image

Steps to reproduce the problem

Run web ui
Generate the image
Sometime it will be crashed randomly, with any image size or model

What should have happened?

Can generate app without being crashed

Commit where the problem happens

20230416_experimental

What platforms do you use to access the UI ?

MacOS

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--no-half --no-download-sd-model --precision full --no-half-vae --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate

List of extensions

ControlNet Ultrasharp

Console logs

stable-diffusion-webui-custom/python/3.10.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Additional information

No response

brkirch commented 1 year ago

stable-diffusion-webui-custom/python/3.10.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

This is actually just a warning; any errors displayed are before this so I'll need whatever was before this (and if there is a traceback, I'll definitely need that).

Also I can't guarantee ControlNet or Ultrasharp will work correctly yet. I plan to include the ControlNet extension in the future but until then there is no guarantee it will work fully.

paraversal commented 1 year ago

I'll chime in since I'm having the same problem, every maybe 10 image generations. For me, the error displayed before is failed assertion _status < MTLCommandBufferStatusCommitted at line 316 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]

This is on M1 Pro, 16GB RAM, Ventura 13.2.1, WebUI release 20230416, running on Degoogled Chromium

marcomastri commented 1 year ago

I’m getting the same error on a very similar setup:

failed assertion _status < MTLCommandBufferStatusCommitted at line 316 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]
Abort trap: 6
logout

Macbook Pro M1, 16GB RAM, Ventura 13.3.1, release 20230416 on Firefox

x4080 commented 1 year ago

I got the same error randomly, M2 pro mac mini 16gb like @marcomastri

kacherHuynh commented 1 year ago

@brkirch sorry for my late reply, please check the detailed log in the attachment. Thank you so much~!

x4080 commented 1 year ago

Using the latest release, it seems to rarely having errors, but I think it used much more memory now

brkirch commented 1 year ago

@kacherHuynh Are you still using the experimental version? v1.1.1-RC should not have that issue as often.

Using the latest release, it seems to rarely having errors, but I think it used much more memory now

This is correct, it turns out that torch.mps.empty_cache() was causing the most of the crashing reported here. To prevent the issue I had to remove the usage of torch.mps.empty_cache() for now which means that memory isn't cleaned up often or as thoroughly which will result in overall higher memory usage.

x4080 commented 1 year ago

@brkirch What mac do you use ? I'm using M2 pro 16gb and using just 512x512 and controlnet 1.1 tile (img2img) already use swap size about 600MB, do you think we can improve memory usage ?

I just tested DrawThings app using the same config and it uses less memory, is it because python have a lot of overhead ?

Btw thanks for your big effort

kacherHuynh commented 1 year ago

@kacherHuynh Are you still using the experimental version? v1.1.1-RC should not have that issue as often.

Using the latest release, it seems to rarely having errors, but I think it used much more memory now

This is correct, it turns out that torch.mps.empty_cache() was causing the most of the crashing reported here. To prevent the issue I had to remove the usage of torch.mps.empty_cache() for now which means that memory isn't cleaned up often or as thoroughly which will result in overall higher memory usage.

I have just updated and tried today, crashes come even more often. only after generating 1 or 2 images. Please have a look. Thank you so much!

Error completing request█▍                       | 4/25 [00:04<00:26,  1.29s/it]
Arguments: ('task(mn2g02iln8zey3a)', '8k portrait of beautiful cyborg with brown hair, intricate, elegant, highly detailed, majestic, digital photography, art by Artgerm and ruan jia and greg rutkowski surreal painting gold butterfly filigree, broken glass, (masterpiece, side lighting, finely detailed beautiful eyes: 1.2), hdr', 'canvas frame, cartoon, 3d, ((disfigured)), ((bad art)), ((deformed)),((extra limbs)),((close up)),((b&w)), weird colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), signature, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render', [], 25, 16, True, False, 1, 1, 7, 132340232.0, -1.0, 0, 0, 0, False, 896, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, <controlnet.py.UiControlNetUnit object at 0x2b89a6dd0>, <controlnet.py.UiControlNetUnit object at 0x2b89a6e60>, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, False, None, False, 50) {}
Traceback (most recent call last):
  File "/Users/kacher/stable-diffusion-webui-custom/modules/call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "/Users/kacher/stable-diffusion-webui-custom/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/Users/kacher/stable-diffusion-webui-custom/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/Users/kacher/stable-diffusion-webui-custom/modules/processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "/Users/kacher/stable-diffusion-webui-custom/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "/Users/kacher/stable-diffusion-webui-custom/modules/processing.py", line 669, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/Users/kacher/stable-diffusion-webui-custom/modules/processing.py", line 887, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "/Users/kacher/stable-diffusion-webui-custom/modules/sd_samplers_kdiffusion.py", line 377, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/Users/kacher/stable-diffusion-webui-custom/modules/sd_samplers_kdiffusion.py", line 251, in launch_sampling
    return func()
  File "/Users/kacher/stable-diffusion-webui-custom/modules/sd_samplers_kdiffusion.py", line 377, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/Users/kacher/stable-diffusion-webui-custom/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/kacher/stable-diffusion-webui-custom/repositories/k-diffusion/k_diffusion/sampling.py", line 576, in sample_dpmpp_sde
    denoised_2 = model(x_2, sigma_fn(s) * s_in, **extra_args)
  File "/Users/kacher/stable-diffusion-webui-custom/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/kacher/stable-diffusion-webui-custom/modules/sd_samplers_kdiffusion.py", line 167, in forward
    devices.test_for_nans(x_out, "unet")
  File "/Users/kacher/stable-diffusion-webui-custom/modules/devices.py", line 157, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

brkirch / stable-diffusion-webui