Closed SharkWipf closed 2 years ago
Same problem here. Checked Use deepbooru for caption in the pre-process image of Train. The first one runs normally, and the second one causes an error. If I delete the deepbooru folder from the models folder, it works again.
022-10-13 17:39:09.397944: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-13 17:39:09.704583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21652 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:0e:00.0, compute capability: 8.6 WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually. 2022-10-13 17:39:15.516201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500 Could not locate zlibwapi.dll. Please make sure it is in your library path!
OS : win 10, 3090ti Edge browser Commit revision 04c0e643f2eec68d93a76db171b4d70595808702 Lauch flags : --opt-split-attention --autolaunch --allow-code --deepdanbooru
I solved it by downloading zlibwapi.dll and putting it in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin.
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-zlib-windows
I face the problem like this in my colab I just change the CODA version in the colab to fix this.
Error completing request
Arguments: ('', '', 'None', 'None', 1, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, '', 25, True, 5.0, False, None, '', 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
File "/notebooks/SDW/modules/ui.py", line 181, in f
res = list(func(*args, **kwargs))
File "/notebooks/SDW/webui.py", line 64, in f
res = func(*args, **kwargs)
File "/notebooks/SDW/modules/txt2img.py", line 43, in txt2img
processed = process_images(p)
File "/notebooks/SDW/modules/processing.py", line 397, in process_images
uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
File "/notebooks/SDW/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/notebooks/SDW/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 558, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/notebooks/SDW/modules/sd_hijack.py", line 334, in forward
z1 = self.process_tokens(tokens, multipliers)
File "/notebooks/SDW/modules/sd_hijack.py", line 349, in process_tokens
tokens = torch.asarray(remade_batch_tokens).to(device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Not a huge issue as I just restart after pre-processing images with deepdanbooru, probably some weird mismatch of Cuda in my Paperspace notebook.
FWIW, I am on cuda version 11.8, Nvidia driver version 520.61.05, RTX 3090. Judging by the comments so far it sounds like that's probably too new?
I am also having a similar issue, I'm using CUDA 11.7, Nvidia driver 515.76 and using a 3090TI
Same issue, I'm using CUDA 11.3, torch 1.12.1, single 2080Ti, ubuntu system
Error completing request Arguments: ('', '', 'None', 'None', 1, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, '', 25, True, 5.0, False, None, '', 1, '', 0, '', True, False, False) {} Traceback (most recent call last): File "/notebooks/SDW/modules/ui.py", line 181, in f res = list(func(*args, **kwargs)) File "/notebooks/SDW/webui.py", line 64, in f res = func(*args, **kwargs) File "/notebooks/SDW/modules/txt2img.py", line 43, in txt2img processed = process_images(p) File "/notebooks/SDW/modules/processing.py", line 397, in process_images uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps) File "/notebooks/SDW/modules/prompt_parser.py", line 138, in get_learned_conditioning conds = model.get_learned_conditioning(texts) File "/notebooks/SDW/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 558, in get_learned_conditioning c = self.cond_stage_model(c) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/notebooks/SDW/modules/sd_hijack.py", line 334, in forward z1 = self.process_tokens(tokens, multipliers) File "/notebooks/SDW/modules/sd_hijack.py", line 349, in process_tokens tokens = torch.asarray(remade_batch_tokens).to(device) RuntimeError: CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Not a huge issue as I just restart after pre-processing images with deepdanbooru, probably some weird mismatch of Cuda in my Paperspace notebook.
same here
Update: I've tried several CUDA versions and (Linux) environments and can't seem to get it working at all. I did notice there are some errors when Deepdanbooru gets executed, however. The Deepdanbooru code still gets executed, but I'm guessing it probably messes with the registered CUDA settings because it can't re-init cuBLAS, causing the next img2img/txt2img to fail. "Error completing request" and everything below is where image generation is triggered, above that is just deepdanbooru (the input image was just a blob of abstract purple, so it really only detected 2 tags, this part is not a bug).
2022-10-15 22:58:57.799100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-15 22:58:57.931302: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-15 22:58:58.869585: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2022-10-15 22:58:58.869614: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 7bc68f14784c
2022-10-15 22:58:58.869619: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 7bc68f14784c
2022-10-15 22:58:58.869724: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
2022-10-15 22:58:58.869746: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 520.61.5
2022-10-15 22:58:58.869909: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 1s 1s/step
0.7328718900680542 moon
0.6844184994697571 multiple_girls
Error completing request
Arguments: ('', '', 'Simple', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 0, 0, 0, False, '', False, False, None, '', 1, '', 0, '', True, True, False) {}
Traceback (most recent call last):
File "/home/root/src/stable-diffusion-webui/modules/ui.py", line 212, in f
res = list(func(*args, **kwargs))
File "/home/root/src/stable-diffusion-webui/webui.py", line 64, in f
res = func(*args, **kwargs)
File "/home/root/src/stable-diffusion-webui/modules/txt2img.py", line 44, in txt2img
processed = process_images(p)
File "/home/root/src/stable-diffusion-webui/modules/processing.py", line 397, in process_images
uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
File "/home/root/src/stable-diffusion-webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/root/src/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 558, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/root/src/stable-diffusion-webui/modules/sd_hijack.py", line 334, in forward
z1 = self.process_tokens(tokens, multipliers)
File "/home/root/src/stable-diffusion-webui/modules/sd_hijack.py", line 349, in process_tokens
tokens = torch.asarray(remade_batch_tokens).to(device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
same problem
Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-126-generic x86_64), RTX 3090
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
error log:
2022-10-18 22:39:56.822689: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-18 22:39:57.007121: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-10-18 22:39:57.046040: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-18 22:39:57.823299: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/miniconda3/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:
2022-10-18 22:39:57.823384: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/miniconda3/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:
2022-10-18 22:39:57.823394: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-10-18 22:39:58.957262: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2022-10-18 22:39:58.957298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2022-10-18 22:39:58.957521: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 3s 3s/step
0.9999967217445374 flower
0.9976820945739746 maid_headdress
0.9965994358062744 maid
0.9940873384475708 tray
0.992828369140625 hydrangea
0.9785534739494324 apron
0.9732893109321594 maid_apron
0.9611615538597107 long_hair
0.9472941160202026 pink_flower
0.9466087818145752 waist_apron
0.9219543933868408 multiple_girls
0.9178032875061035 vase
0.9112962484359741 purple_flower
0.8946253061294556 white_flower
0.8906084299087524 holding_tray
0.8897321820259094 thighhighs
0.8700262308120728 navel
0.8618443012237549 lily_(flower)
0.8400490880012512 waitress
0.8256860375404358 enmaided
0.8048192262649536 rose
0.7983884811401367 wrist_cuffs
0.7971858978271484 breasts
0.7840455770492554 hair_flower
0.7349395751953125 white_apron
0.7275054454803467 looking_at_viewer
0.7117940783500671 white_legwear
0.7071746587753296 silver_hair
0.6788415908813477 blue_eyes
0.6713621020317078 teapot
0.6598066091537476 table
0.6584057211875916 blue_flower
0.6500906944274902 garter_straps
0.6371893882751465 day
0.6288817524909973 daisy
0.6258440017700195 cup
0.6212639212608337 hair_ornament
0.5960835218429565 bow
0.5882710218429565 white_hair
0.5618525743484497 2girls
0.5511027574539185 crop_top
0.5352045893669128 frills
0.5238999128341675 ribbon
0.5232399106025696 midriff
0.5033739805221558 outdoors
Error completing request
Arguments: (0, '2girls, apron, blue_eyes, blue_flower, bow, breasts, crop_top, cup, daisy, day, enmaided, flower, frills, garter_straps, hair_flower, hair_ornament, holding_tray, hydrangea, lily_\\(flower\\), long_hair, looking_at_viewer, maid, maid_apron, maid_headdress, midriff, multiple_girls, navel, outdoors, pink_flower, purple_flower, ribbon, rose, silver_hair, table, teapot, thighhighs, tray, vase, waist_apron, waitress, white_apron, white_flower, white_hair, white_legwear, wrist_cuffs', '', 'None', 'None', <PIL.Image.Image image mode=RGB size=1280x2002 at 0x7F6F9BC90D00>, None, None, None, 0, 20, 0, 4, 1, False, False, 1, 1, 7, 0.75, -1.0, -1.0, 0, 0, 0, False, 512, 512, 0, False, 32, 0, '', '', 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, None, '', '<p style="margin-bottom:0.75em">Will upscale the image to twice the dimensions; use width and height sliders to set tile size</p>', 64, 0, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
File "/root/autodl-tmp/stable-diffusion-webui/modules/ui.py", line 212, in f
res = list(func(*args, **kwargs))
File "/root/autodl-tmp/stable-diffusion-webui/webui.py", line 64, in f
res = func(*args, **kwargs)
File "/root/autodl-tmp/stable-diffusion-webui/modules/img2img.py", line 126, in img2img
processed = process_images(p)
File "/root/autodl-tmp/stable-diffusion-webui/modules/processing.py", line 370, in process_images
p.init(all_prompts, all_seeds, all_subseeds)
File "/root/autodl-tmp/stable-diffusion-webui/modules/processing.py", line 694, in init
image = image.to(shared.device)
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I have identified the commit which began the errors, at least for me
If you want to fix the error temporarily, checkout fec2221eeaafb50afd26ba3e109bf6f928011e69, the last commit before the errors began
I have identified the commit which began the errors, at least for me
If you want to fix the error temporarily, checkout fec2221, the last commit before the errors began
Yes, It fix RuntimeError: CUDA error: unspecified launch failure
,But I'd rather know how to fix libnvinfer.so.7 missing error
I face the problem like this in my colab I just change the CODA version in the colab to fix this.
May I ask what cuda version did you use in colab?
I slove by install tensorflow-cpu. Remember install in the venv.
I slove by install tensorflow-cpu. Remember install in the venv.
You are my life saver!
I slove by install tensorflow-cpu. Remember install in the venv.
Worth noting, while this does work, it seems to work by disabling GPU support in Tensorflow entirely, thus working around the issue of the unclean CUDA state by disabling CUDA for deepbooru (and anything else using Tensorflow) entirely.
The bug where Deepbooru fails on CUDA and leaves the GPU in an unclean state still exists, but is just avoided by not using the GPU in the first place. But any other Tensorflow-based scripts will also be deferred to CPU-only. txt2img/img2img itself does not seem to use Tensorflow so it does not seem to affect this part.
Also, because tensorflow-cpu is essentially a CPU-only replacement package for tensorflow proper, it seems the installation order matters, if you install tensorflow-cpu before installing tensorflow proper, or update tensorflow proper after tensorflow-cpu, this workaround will not work as it will continue using tensorflow proper.
In short: this workaround works, but does not solve the problem, and may cause other problems elsewhere.
I slove by install tensorflow-cpu. Remember install in the venv.
Worth noting, while this does work, it seems to work by disabling GPU support in Tensorflow entirely, thus working around the issue of the unclean CUDA state by disabling CUDA for deepbooru (and anything else using Tensorflow) entirely.
The bug where Deepbooru fails on CUDA and leaves the GPU in an unclean state still exists, but is just avoided by not using the GPU in the first place. But any other Tensorflow-based scripts will also be deferred to CPU-only. txt2img/img2img itself does not seem to use Tensorflow so it does not seem to affect this part.
Also, because tensorflow-cpu is essentially a CPU-only replacement package for tensorflow proper, it seems the installation order matters, if you install tensorflow-cpu before installing tensorflow proper, or update tensorflow proper after tensorflow-cpu, this workaround will not work as it will continue using tensorflow proper.
In short: this workaround works, but does not solve the problem, and may cause other problems elsewhere.
You are right. I tried another method that download "zlibwapi.dll" manually and put it in "NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin". Just as toyxyz mentioned and it works. If someone tried his method and still not resolved, installing tensorflow-cpu is a simple way and can recovery anytime until the bug is fixed. ٩(´∀`*)
You are right. I tried another method that download "zlibwapi.dll" manually and put it in "NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin". Just as toyxyz mentioned and it works. If someone tried his method and still not resolved, installing tensorflow-cpu is a simple way and can recovery anytime until the bug is fixed. ٩(´∀`*)
I'm unsure why this would solve the issue on Windows, especially when on Linux I already have all zlib dependencies listed on the cuDNN page. It might be a different bug that the Linux users like myself experience.
I face the problem like this in my colab I just change the CODA version in the colab to fix this.
Which version did you change it to?
This was already addressed in the original PR but that change was reverted. https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1752/commits/5f12e7efd92ad802742f96788b4be3249ad02829
On *nix, multiprocessing prefers fork over spawn.
When running the Deepdanbooru model, TensorFlow tries to initiate the same primary GPU that the WebUI is using, which causes the crash. I was able to resolve this bug by telling the WebUI to use a secondary GPU. This is done by passing the "--device-id=1" argument when launching the WebUI.
If you do not have a secondary GPU, then try making Deepdanbooru uses the CPU instead of the primary GPU. One method mentioned above is to install "tenserflow-cpu"; however, there are unknown implications (as also mentioned above).
When running the Deepdanbooru model, TensorFlow tries to initiate the same primary GPU that the WebUI is using, which causes the crash. I was able to resolve this bug by telling the WebUI to use a secondary GPU. This is done by passing the "--device-id=1" argument when launching the WebUI.
If you do not have a secondary GPU, then try making Deepdanbooru uses the CPU instead of the primary GPU. One method mentioned above is to install "tenserflow-cpu"; however, there are unknown implications (as also mentioned above).
This should be resolved with #3421
When running the Deepdanbooru model, TensorFlow tries to initiate the same primary GPU that the WebUI is using, which causes the crash. I was able to resolve this bug by telling the WebUI to use a secondary GPU. This is done by passing the "--device-id=1" argument when launching the WebUI. If you do not have a secondary GPU, then try making Deepdanbooru uses the CPU instead of the primary GPU. One method mentioned above is to install "tenserflow-cpu"; however, there are unknown implications (as also mentioned above).
This should be resolved with #3421
I have been getting this error since this commit. Webui is running on Colab T4. notebook
2022-10-23 06:29:38.974494: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-23 06:29:40.327359: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-23 06:29:42.545022: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib64-nvidia
2022-10-23 06:29:42.545270: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/envs/automatic/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib64-nvidia
2022-10-23 06:29:42.545293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Process SpawnProcess-2:
Traceback (most recent call last):
File "/usr/local/envs/automatic/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/envs/automatic/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/content/stable-diffusion-webui/modules/deepbooru.py", line 35, in deepbooru_process
model, tags = get_deepbooru_tags_model()
File "/content/stable-diffusion-webui/modules/deepbooru.py", line 96, in get_deepbooru_tags_model
from basicsr.utils.download_util import load_file_from_url
File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/__init__.py", line 3, in <module>
from .archs import *
File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/archs/__init__.py", line 5, in <module>
from basicsr.utils import get_root_logger, scandir
File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/utils/__init__.py", line 1, in <module>
from .color_util import bgr2ycbcr, rgb2ycbcr, rgb2ycbcr_pt, ycbcr2bgr, ycbcr2rgb
File "/usr/local/envs/automatic/lib/python3.10/site-packages/basicsr/utils/color_util.py", line 2, in <module>
import torch
File "/usr/local/envs/automatic/lib/python3.10/site-packages/torch/__init__.py", line 202, in <module>
from torch._C import * # noqa: F403
ImportError: /usr/local/envs/automatic/lib/python3.10/site-packages/torch/lib/libtorch_cuda_cpp.so: symbol cudaGraphRetainUserObject version libcudart.so.11.0 not defined in file libcudart.so.11.0 with link time reference
Interrupted with signal 2 in <frame at 0x7f4448d2adc0, file '/content/stable-diffusion-webui/webui.py', line 105, code wait_on_server>
I have been getting this error since this commit. Webui is running on Colab T4. notebook
You need to install TensorRT. Assuming your collab uses a Debian-based OS like Ubuntu, you can follow the instructions at https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-debian. After the installation, you need to export the path of the libnvinfer.so in the environment variable, LD_LIBRARY_PATH, and the CUDA path.
Example (replace with own your paths and version):
export LD_LIBRARY_PATH="/usr/local/cuda-11.6/lib64:/usr/lib/x86_64-linux-gnu"
To find out where your libnvinfer.so was installed, execute:
dpkg -L libnvinfer8
In your error, the program was looking for libnvinfer.so.7, but the latest TensorRT package only provides libnvinfer.so.8. To fix, create a symbolic link:
ln -s /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
This should be resolved with #3421
I can confirm #3421 fixed this issue for me (I nuked the venv and let it reinstall, so no more tensorflow-cpu), and as long as you have all dependencies and right versions installed, it seems to work fine now, I can see it using the GPU without problems, and generation afterwards works fine too.
Thanks for the fix @Greendayle!
Having this issue on Windows, when using img2img with >4 batches of single images. 3090 with 11.6 CUDA, will update to newer version to see if there's improvement I've re-install the entire repo but the issue persisted, not sure what the fixes are from the other issues mentioned above.
Describe the bug After running deepdanbooru on an image, new images can't be generated anymore due to a cuda error.
To Reproduce Steps to reproduce the behavior:
Error:
Desktop (please complete the following information):
Additional context Launch flags:
--listen --opt-split-attention --allow-code --deepdanbooru
Everything else seems to be working fine, no issues. Just the deepdanbooru generation seems to break it.