AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.89k stars 27.06k forks source link

Deepdanbooru breaks cuda on image generation after running. #2404

Closed SharkWipf closed 2 years ago

SharkWipf commented 2 years ago

Describe the bug After running deepdanbooru on an image, new images can't be generated anymore due to a cuda error.

To Reproduce Steps to reproduce the behavior:

  1. Enable deepdanbooru
  2. Generate or upload an image to img2img
  3. Click "Interrogate DeepBooru"
  4. After it finishes it, try to generate any image on either generator
  5. Generation will halt, a cuda error will be shown

Error:

Error completing request
Arguments: (0, 'Test', 'Test', 'None', 'None', <PIL.Image.Image image mode=RGB size=512x512 at 0x7F483FF50400>, None, None, None, 0, 60, 0, 4, 1, False, False, 1, 1, 7, 0.75, -1.0, -1.0, 0, 0, 0, False, 512, 512, 0, False, 32, 0, '', '', 0, 4.0, 1, 1, 0, 0, 0.0, 4.0, 0.1, 0.1, 1, True, False, False, 0, False, '', 1, False, 0, 1, False, False, False, '', '', '', 1, 50, 0, False, 4, 1, 4, 0.09, True, 1, 0, 7, False, False, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, None, '', '<p style="margin-bottom:0.75em">Will upscale the image to twice the dimensions; use width and height sliders to set tile size</p>', 64, 0, 1, '', 0, '', True, False) {}
Traceback (most recent call last):
  File "/home/me/src/stable-diffusion-webui/modules/ui.py", line 184, in f
    res = list(func(*args, **kwargs))
  File "/home/me/src/stable-diffusion-webui/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/home/me/src/stable-diffusion-webui/modules/img2img.py", line 126, in img2img
    processed = process_images(p)
  File "/home/me/src/stable-diffusion-webui/modules/processing.py", line 371, in process_images
    p.init(all_prompts, all_seeds, all_subseeds)
  File "/home/me/src/stable-diffusion-webui/modules/processing.py", line 607, in init
    self.sampler = sd_samplers.create_sampler_with_index(sd_samplers.samplers_for_img2img, self.sampler_index, self.sd_model)
  File "/home/me/src/stable-diffusion-webui/modules/sd_samplers.py", line 50, in create_sampler_with_index
    sampler = config.constructor(model)
  File "/home/me/src/stable-diffusion-webui/modules/sd_samplers.py", line 33, in <lambda>
    SamplerData(label, lambda model, funcname=funcname: KDiffusionSampler(funcname, model), aliases, options)
  File "/home/me/src/stable-diffusion-webui/modules/sd_samplers.py", line 306, in __init__
    self.model_wrap = k_diffusion.external.CompVisDenoiser(sd_model, quantize=shared.opts.enable_quantization)
  File "/home/me/src/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 135, in __init__
    super().__init__(model, model.alphas_cumprod, quantize=quantize)
  File "/home/me/src/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 92, in __init__
    super().__init__(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5, quantize)
  File "/home/me/src/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/_tensor.py", line 32, in wrapped
    return f(*args, **kwargs)
  File "/home/me/src/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/_tensor.py", line 639, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Desktop (please complete the following information):

Additional context Launch flags: --listen --opt-split-attention --allow-code --deepdanbooru Everything else seems to be working fine, no issues. Just the deepdanbooru generation seems to break it.

toyxyz commented 2 years ago

Same problem here. Checked Use deepbooru for caption in the pre-process image of Train. The first one runs normally, and the second one causes an error. If I delete the deepbooru folder from the models folder, it works again.

022-10-13 17:39:09.397944: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-13 17:39:09.704583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21652 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:0e:00.0, compute capability: 8.6 WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually. 2022-10-13 17:39:15.516201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500 Could not locate zlibwapi.dll. Please make sure it is in your library path!

OS : win 10, 3090ti Edge browser Commit revision 04c0e643f2eec68d93a76db171b4d70595808702 Lauch flags : --opt-split-attention --autolaunch --allow-code --deepdanbooru


I solved it by downloading zlibwapi.dll and putting it in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin.

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-zlib-windows

HydrogenRb commented 2 years ago

I face the problem like this in my colab I just change the CODA version in the colab to fix this.

rabidcopy commented 2 years ago
Error completing request
Arguments: ('', '', 'None', 'None', 1, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, '', 25, True, 5.0, False, None, '', 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "/notebooks/SDW/modules/ui.py", line 181, in f
    res = list(func(*args, **kwargs))
  File "/notebooks/SDW/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/notebooks/SDW/modules/txt2img.py", line 43, in txt2img
    processed = process_images(p)
  File "/notebooks/SDW/modules/processing.py", line 397, in process_images
    uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
  File "/notebooks/SDW/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/notebooks/SDW/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 558, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/notebooks/SDW/modules/sd_hijack.py", line 334, in forward
    z1 = self.process_tokens(tokens, multipliers)
  File "/notebooks/SDW/modules/sd_hijack.py", line 349, in process_tokens
    tokens = torch.asarray(remade_batch_tokens).to(device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 

Not a huge issue as I just restart after pre-processing images with deepdanbooru, probably some weird mismatch of Cuda in my Paperspace notebook.

SharkWipf commented 2 years ago

FWIW, I am on cuda version 11.8, Nvidia driver version 520.61.05, RTX 3090. Judging by the comments so far it sounds like that's probably too new?

ChronoStriker1 commented 2 years ago

I am also having a similar issue, I'm using CUDA 11.7, Nvidia driver 515.76 and using a 3090TI

1398listener commented 2 years ago

Same issue, I'm using CUDA 11.3, torch 1.12.1, single 2080Ti, ubuntu system

Error completing request
Arguments: ('', '', 'None', 'None', 1, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, '', 25, True, 5.0, False, None, '', 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "/notebooks/SDW/modules/ui.py", line 181, in f
    res = list(func(*args, **kwargs))
  File "/notebooks/SDW/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/notebooks/SDW/modules/txt2img.py", line 43, in txt2img
    processed = process_images(p)
  File "/notebooks/SDW/modules/processing.py", line 397, in process_images
    uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
  File "/notebooks/SDW/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/notebooks/SDW/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 558, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/notebooks/SDW/modules/sd_hijack.py", line 334, in forward
    z1 = self.process_tokens(tokens, multipliers)
  File "/notebooks/SDW/modules/sd_hijack.py", line 349, in process_tokens
    tokens = torch.asarray(remade_batch_tokens).to(device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 

Not a huge issue as I just restart after pre-processing images with deepdanbooru, probably some weird mismatch of Cuda in my Paperspace notebook.

Xynonners commented 2 years ago

same here

SharkWipf commented 2 years ago

Update: I've tried several CUDA versions and (Linux) environments and can't seem to get it working at all. I did notice there are some errors when Deepdanbooru gets executed, however. The Deepdanbooru code still gets executed, but I'm guessing it probably messes with the registered CUDA settings because it can't re-init cuBLAS, causing the next img2img/txt2img to fail. "Error completing request" and everything below is where image generation is triggered, above that is just deepdanbooru (the input image was just a blob of abstract purple, so it really only detected 2 tags, this part is not a bug).

2022-10-15 22:58:57.799100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-15 22:58:57.931302: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-15 22:58:58.869585: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2022-10-15 22:58:58.869614: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 7bc68f14784c
2022-10-15 22:58:58.869619: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 7bc68f14784c
2022-10-15 22:58:58.869724: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
2022-10-15 22:58:58.869746: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 520.61.5
2022-10-15 22:58:58.869909: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 1s 1s/step
0.7328718900680542 moon
0.6844184994697571 multiple_girls
Error completing request
Arguments: ('', '', 'Simple', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 0, 0, 0, False, '', False, False, None, '', 1, '', 0, '', True, True, False) {}
Traceback (most recent call last):
  File "/home/root/src/stable-diffusion-webui/modules/ui.py", line 212, in f
    res = list(func(*args, **kwargs))
  File "/home/root/src/stable-diffusion-webui/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/home/root/src/stable-diffusion-webui/modules/txt2img.py", line 44, in txt2img
    processed = process_images(p)
  File "/home/root/src/stable-diffusion-webui/modules/processing.py", line 397, in process_images
    uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
  File "/home/root/src/stable-diffusion-webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/root/src/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 558, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/root/src/stable-diffusion-webui/modules/sd_hijack.py", line 334, in forward
    z1 = self.process_tokens(tokens, multipliers)
  File "/home/root/src/stable-diffusion-webui/modules/sd_hijack.py", line 349, in process_tokens
    tokens = torch.asarray(remade_batch_tokens).to(device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
judgeou commented 2 years ago

same problem

Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-126-generic x86_64), RTX 3090

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

error log:

2022-10-18 22:39:56.822689: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-18 22:39:57.007121: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-10-18 22:39:57.046040: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-18 22:39:57.823299: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/miniconda3/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:
2022-10-18 22:39:57.823384: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/miniconda3/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:
2022-10-18 22:39:57.823394: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-10-18 22:39:58.957262: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2022-10-18 22:39:58.957298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2022-10-18 22:39:58.957521: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 3s 3s/step
0.9999967217445374 flower
0.9976820945739746 maid_headdress
0.9965994358062744 maid
0.9940873384475708 tray
0.992828369140625 hydrangea
0.9785534739494324 apron
0.9732893109321594 maid_apron
0.9611615538597107 long_hair
0.9472941160202026 pink_flower
0.9466087818145752 waist_apron
0.9219543933868408 multiple_girls
0.9178032875061035 vase
0.9112962484359741 purple_flower
0.8946253061294556 white_flower
0.8906084299087524 holding_tray
0.8897321820259094 thighhighs
0.8700262308120728 navel
0.8618443012237549 lily_(flower)
0.8400490880012512 waitress
0.8256860375404358 enmaided
0.8048192262649536 rose
0.7983884811401367 wrist_cuffs
0.7971858978271484 breasts
0.7840455770492554 hair_flower
0.7349395751953125 white_apron
0.7275054454803467 looking_at_viewer
0.7117940783500671 white_legwear
0.7071746587753296 silver_hair
0.6788415908813477 blue_eyes
0.6713621020317078 teapot
0.6598066091537476 table
0.6584057211875916 blue_flower
0.6500906944274902 garter_straps
0.6371893882751465 day
0.6288817524909973 daisy
0.6258440017700195 cup
0.6212639212608337 hair_ornament
0.5960835218429565 bow
0.5882710218429565 white_hair
0.5618525743484497 2girls
0.5511027574539185 crop_top
0.5352045893669128 frills
0.5238999128341675 ribbon
0.5232399106025696 midriff
0.5033739805221558 outdoors
Error completing request
Arguments: (0, '2girls, apron, blue_eyes, blue_flower, bow, breasts, crop_top, cup, daisy, day, enmaided, flower, frills, garter_straps, hair_flower, hair_ornament, holding_tray, hydrangea, lily_\\(flower\\), long_hair, looking_at_viewer, maid, maid_apron, maid_headdress, midriff, multiple_girls, navel, outdoors, pink_flower, purple_flower, ribbon, rose, silver_hair, table, teapot, thighhighs, tray, vase, waist_apron, waitress, white_apron, white_flower, white_hair, white_legwear, wrist_cuffs', '', 'None', 'None', <PIL.Image.Image image mode=RGB size=1280x2002 at 0x7F6F9BC90D00>, None, None, None, 0, 20, 0, 4, 1, False, False, 1, 1, 7, 0.75, -1.0, -1.0, 0, 0, 0, False, 512, 512, 0, False, 32, 0, '', '', 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, None, '', '<p style="margin-bottom:0.75em">Will upscale the image to twice the dimensions; use width and height sliders to set tile size</p>', 64, 0, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "/root/autodl-tmp/stable-diffusion-webui/modules/ui.py", line 212, in f
    res = list(func(*args, **kwargs))
  File "/root/autodl-tmp/stable-diffusion-webui/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/root/autodl-tmp/stable-diffusion-webui/modules/img2img.py", line 126, in img2img
    processed = process_images(p)
  File "/root/autodl-tmp/stable-diffusion-webui/modules/processing.py", line 370, in process_images
    p.init(all_prompts, all_seeds, all_subseeds)
  File "/root/autodl-tmp/stable-diffusion-webui/modules/processing.py", line 694, in init
    image = image.to(shared.device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
fs-sys commented 2 years ago

I have identified the commit which began the errors, at least for me

If you want to fix the error temporarily, checkout fec2221eeaafb50afd26ba3e109bf6f928011e69, the last commit before the errors began

judgeou commented 2 years ago

I have identified the commit which began the errors, at least for me

If you want to fix the error temporarily, checkout fec2221, the last commit before the errors began

Yes, It fix RuntimeError: CUDA error: unspecified launch failure,But I'd rather know how to fix libnvinfer.so.7 missing error

zhupeter010903 commented 2 years ago

I face the problem like this in my colab I just change the CODA version in the colab to fix this.

May I ask what cuda version did you use in colab?

happk commented 2 years ago

I slove by install tensorflow-cpu. Remember install in the venv.

GrennKren commented 2 years ago

I slove by install tensorflow-cpu. Remember install in the venv.

You are my life saver!

SharkWipf commented 2 years ago

I slove by install tensorflow-cpu. Remember install in the venv.

Worth noting, while this does work, it seems to work by disabling GPU support in Tensorflow entirely, thus working around the issue of the unclean CUDA state by disabling CUDA for deepbooru (and anything else using Tensorflow) entirely.

The bug where Deepbooru fails on CUDA and leaves the GPU in an unclean state still exists, but is just avoided by not using the GPU in the first place. But any other Tensorflow-based scripts will also be deferred to CPU-only. txt2img/img2img itself does not seem to use Tensorflow so it does not seem to affect this part.

Also, because tensorflow-cpu is essentially a CPU-only replacement package for tensorflow proper, it seems the installation order matters, if you install tensorflow-cpu before installing tensorflow proper, or update tensorflow proper after tensorflow-cpu, this workaround will not work as it will continue using tensorflow proper.

In short: this workaround works, but does not solve the problem, and may cause other problems elsewhere.

happk commented 2 years ago

I slove by install tensorflow-cpu. Remember install in the venv.

Worth noting, while this does work, it seems to work by disabling GPU support in Tensorflow entirely, thus working around the issue of the unclean CUDA state by disabling CUDA for deepbooru (and anything else using Tensorflow) entirely.

The bug where Deepbooru fails on CUDA and leaves the GPU in an unclean state still exists, but is just avoided by not using the GPU in the first place. But any other Tensorflow-based scripts will also be deferred to CPU-only. txt2img/img2img itself does not seem to use Tensorflow so it does not seem to affect this part.

Also, because tensorflow-cpu is essentially a CPU-only replacement package for tensorflow proper, it seems the installation order matters, if you install tensorflow-cpu before installing tensorflow proper, or update tensorflow proper after tensorflow-cpu, this workaround will not work as it will continue using tensorflow proper.

In short: this workaround works, but does not solve the problem, and may cause other problems elsewhere.

You are right. I tried another method that download "zlibwapi.dll" manually and put it in "NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin". Just as toyxyz mentioned and it works. If someone tried his method and still not resolved, installing tensorflow-cpu is a simple way and can recovery anytime until the bug is fixed. ٩(´∀`*)

SharkWipf commented 2 years ago

You are right. I tried another method that download "zlibwapi.dll" manually and put it in "NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin". Just as toyxyz mentioned and it works. If someone tried his method and still not resolved, installing tensorflow-cpu is a simple way and can recovery anytime until the bug is fixed. ٩(´∀`*)

I'm unsure why this would solve the issue on Windows, especially when on Linux I already have all zlib dependencies listed on the cuDNN page. It might be a different bug that the Linux users like myself experience.

R-N commented 2 years ago

I face the problem like this in my colab I just change the CODA version in the colab to fix this.

Which version did you change it to?

HookedBehemoth commented 2 years ago

This was already addressed in the original PR but that change was reverted. https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1752/commits/5f12e7efd92ad802742f96788b4be3249ad02829

On *nix, multiprocessing prefers fork over spawn.

lnguyenfx commented 2 years ago

When running the Deepdanbooru model, TensorFlow tries to initiate the same primary GPU that the WebUI is using, which causes the crash. I was able to resolve this bug by telling the WebUI to use a secondary GPU. This is done by passing the "--device-id=1" argument when launching the WebUI.

If you do not have a secondary GPU, then try making Deepdanbooru uses the CPU instead of the primary GPU. One method mentioned above is to install "tenserflow-cpu"; however, there are unknown implications (as also mentioned above).

HookedBehemoth commented 2 years ago

When running the Deepdanbooru model, TensorFlow tries to initiate the same primary GPU that the WebUI is using, which causes the crash. I was able to resolve this bug by telling the WebUI to use a secondary GPU. This is done by passing the "--device-id=1" argument when launching the WebUI.

If you do not have a secondary GPU, then try making Deepdanbooru uses the CPU instead of the primary GPU. One method mentioned above is to install "tenserflow-cpu"; however, there are unknown implications (as also mentioned above).

This should be resolved with #3421

ddPn08 commented 2 years ago

When running the Deepdanbooru model, TensorFlow tries to initiate the same primary GPU that the WebUI is using, which causes the crash. I was able to resolve this bug by telling the WebUI to use a secondary GPU. This is done by passing the "--device-id=1" argument when launching the WebUI. If you do not have a secondary GPU, then try making Deepdanbooru uses the CPU instead of the primary GPU. One method mentioned above is to install "tenserflow-cpu"; however, there are unknown implications (as also mentioned above).

This should be resolved with #3421

I have been getting this error since this commit. Webui is running on Colab T4. notebook

2022-10-23 06:29:38.974494: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-23 06:29:40.327359: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-23 06:29:42.545022: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib64-nvidia
2022-10-23 06:29:42.545270: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib64-nvidia
2022-10-23 06:29:42.545293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/usr/local/envs/automatic/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/envs/automatic/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/content/stable-diffusion-webui/modules/deepbooru.py", line 35, in deepbooru_process
    model, tags = get_deepbooru_tags_model()
  File "/content/stable-diffusion-webui/modules/deepbooru.py", line 96, in get_deepbooru_tags_model
    from basicsr.utils.download_util import load_file_from_url
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/__init__.py", line 3, in <module>
    from .archs import *
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/archs/__init__.py", line 5, in <module>
    from basicsr.utils import get_root_logger, scandir
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/utils/__init__.py", line 1, in <module>
    from .color_util import bgr2ycbcr, rgb2ycbcr, rgb2ycbcr_pt, ycbcr2bgr, ycbcr2rgb
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/utils/color_util.py", line 2, in <module>
    import torch
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/torch/__init__.py", line 202, in <module>
    from torch._C import *  # noqa: F403
ImportError: /usr/local/envs/automatic/lib/python3.10/site-packages/torch/lib/libtorch_cuda_cpp.so: symbol cudaGraphRetainUserObject version libcudart.so.11.0 not defined in file libcudart.so.11.0 with link time reference
Interrupted with signal 2 in <frame at 0x7f4448d2adc0, file '/content/stable-diffusion-webui/webui.py', line 105, code wait_on_server>
lnguyenfx commented 2 years ago

I have been getting this error since this commit. Webui is running on Colab T4. notebook

You need to install TensorRT. Assuming your collab uses a Debian-based OS like Ubuntu, you can follow the instructions at https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-debian. After the installation, you need to export the path of the libnvinfer.so in the environment variable, LD_LIBRARY_PATH, and the CUDA path.

Example (replace with own your paths and version): export LD_LIBRARY_PATH="/usr/local/cuda-11.6/lib64:/usr/lib/x86_64-linux-gnu"

To find out where your libnvinfer.so was installed, execute: dpkg -L libnvinfer8

In your error, the program was looking for libnvinfer.so.7, but the latest TensorRT package only provides libnvinfer.so.8. To fix, create a symbolic link: ln -s /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 /usr/lib/x86_64-linux-gnu/libnvinfer.so.7

SharkWipf commented 2 years ago

This should be resolved with #3421

I can confirm #3421 fixed this issue for me (I nuked the venv and let it reinstall, so no more tensorflow-cpu), and as long as you have all dependencies and right versions installed, it seems to work fine now, I can see it using the GPU without problems, and generation afterwards works fine too.

3494 may be able to fix the nvinfer dependency complication still, but since that is a seperate issue, I will close this one now.

Thanks for the fix @Greendayle!

Kncuk commented 2 years ago

Having this issue on Windows, when using img2img with >4 batches of single images. 3090 with 11.6 CUDA, will update to newer version to see if there's improvement I've re-install the entire repo but the issue persisted, not sure what the fixes are from the other issues mentioned above.