Closed kpoeppel closed 1 year ago
Hi @kpoeppel , which command/script are you running?
I ran the gradio_new.py
web-interface as suggested in README.md
, as well as a adapted version to call the same functionality from cli for debugging. Also removed (for testing) the NSFW filter, to see if that lowers the memory footprint by enough to get it running - it didn't.
I just ran the same command and it takes around 18 GB for 4 samples. When does the OOM error happen? Is it before or after the local / public URL appears in the CLI? Would be helpful if you can include a complete screenshot or the command and error message. Also which OS are you running on? We only tested on linux, for windows, you can refer to this issue: https://github.com/cvlab-columbia/zero123/issues/8#issuecomment-1479838642
Here my configuration, logs and screenshots (all at the state after running the "View from the Left" in the gradio webinterface):
$ python gradio_new.py
sys.argv:
['gradio_new.py']
Instantiating LatentDiffusion...
Loading model from 105000.ckpt
Global Step: 105000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Instantiating Carvekit HiInterface...
Instantiating StableDiffusionSafetyChecker...
Instantiating AutoFeatureExtractor...
/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/blocks.py:1381: DeprecationWarning: The `enable_queue` parameter has been deprecated. Please use the `.queue()` method instead.
warnings.warn(
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://826cc8293abf6723c5.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/blocks.py", line 1059, in process_api
result = await self.call_function(
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/blocks.py", line 868, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/korbip/Programming/External/zero123/zero123/gradio_new.py", line 324, in main_run
(image, has_nsfw_concept) = models['nsfw'](
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/safety_checker.py", line 52, in forward
pooled_output = self.vision_model(clip_input)[1] # pooled_output
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 843, in forward
return self.vision_model(
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 774, in forward
hidden_states = self.embeddings(pixel_values)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 133, in forward
patch_embeds = self.patch_embedding(pixel_values) # shape = [*, width, grid, grid]
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
$ nvidia-smi
Wed Mar 22 17:51:20 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:2B:00.0 Off | Off |
| 0% 28C P8 26W / 450W | 24254MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1219 G /usr/lib/xorg/Xorg 16MiB |
| 0 N/A N/A 3188 C python 24233MiB |
+-----------------------------------------------------------------------------+
$ pip list
Package Version Editable project location
------------------------ ----------- -------------------------------------------------------------
absl-py 1.4.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
albumentations 0.4.3
altair 4.2.2
antlr4-python3-runtime 4.8
anyio 3.6.2
appdirs 1.4.4
asttokens 2.2.1
async-timeout 4.0.2
attrs 22.2.0
backcall 0.2.0
blinker 1.5
braceexpand 0.1.7
cachetools 5.3.0
carvekit-colab 4.1.0
certifi 2022.12.7
charset-normalizer 3.1.0
click 8.1.3
clip 1.0 /home/korbip/Programming/External/zero123/CLIP
cmake 3.26.0
contourpy 1.0.7
cuda-python 12.1.0
cycler 0.11.0
Cython 0.29.33
datasets 2.4.0
decorator 5.1.1
diffusers 0.12.1
dill 0.3.5.1
easydict 1.10
einops 0.3.0
entrypoints 0.4
exceptiongroup 1.1.1
executing 1.2.0
fastapi 0.95.0
fastcore 1.5.28
ffmpy 0.3.0
filelock 3.10.0
fire 0.4.0
fonttools 4.39.2
frozenlist 1.3.3
fsspec 2023.3.0
ftfy 6.1.1
future 0.18.3
gitdb 4.0.10
GitPython 3.1.31
google-auth 2.16.2
google-auth-oauthlib 0.4.6
gradio 3.21.0
grpcio 1.51.3
h11 0.14.0
httpcore 0.16.3
httpx 0.23.3
huggingface-hub 0.13.3
idna 3.4
imageio 2.9.0
imageio-ffmpeg 0.4.2
imgaug 0.2.6
importlib-metadata 6.1.0
importlib-resources 5.12.0
iniconfig 2.0.0
ipython 8.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonschema 4.17.3
kiwisolver 1.4.4
kornia 0.6.0
lazy_loader 0.1
linkify-it-py 2.0.0
lit 16.0.0
llvmlite 0.39.1
loguru 0.6.0
lovely-numpy 0.2.8
lovely-tensors 0.1.14
Mako 1.2.4
Markdown 3.4.1
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.13
networkx 3.0
numba 0.56.4
numpy 1.23.5
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
omegaconf 2.1.1
opencv-python 4.5.5.64
opencv-python-headless 4.7.0.72
orjson 3.8.8
packaging 23.0
pandas 1.5.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.0.1
platformdirs 3.1.1
plotly 5.13.1
pluggy 1.0.0
point-cloud-utils 0.29.1
prompt-toolkit 3.0.38
protobuf 3.20.3
psutil 5.9.4
ptyprocess 0.7.0
pudb 2019.2
pure-eval 0.2.2
pyarrow 11.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydantic 1.10.6
pydeck 0.8.0
pyDeprecate 0.3.1
pydub 0.25.1
Pygments 2.14.0
PyMCubes 0.1.4
Pympler 1.0.1
pyparsing 3.0.9
pyrsistent 0.19.3
pytest 7.2.2
python-dateutil 2.8.2
python-multipart 0.0.6
pytools 2022.1.14
pytorch-lightning 1.4.2
pytz 2022.7.1
pytz-deprecation-shim 0.1.0.post0
PyWavelets 1.4.1
PyYAML 6.0
regex 2022.10.31
requests 2.28.2
requests-oauthlib 1.3.1
responses 0.18.0
rfc3986 1.5.0
rich 13.3.2
rsa 4.9
scikit-image 0.20.0
scipy 1.9.1
semver 2.13.0
setuptools 65.6.3
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
stack-data 0.6.2
starlette 0.26.1
streamlit 1.20.0
sympy 1.11.1
tabulate 0.9.0
taming-transformers 0.0.1 /home/korbip/Programming/External/zero123/taming-transformers
tenacity 8.2.2
tensorboard 2.12.0
tensorboard-data-server 0.7.0
tensorboard-plugin-wit 1.8.1
termcolor 2.2.0
test-tube 0.7.5
tifffile 2023.3.15
tokenizers 0.12.1
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
torch 2.0.0
torch-fidelity 0.3.0
torchmetrics 0.6.0
torchvision 0.15.1
tornado 6.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.22.2
triton 2.0.0
typing_extensions 4.5.0
tzdata 2022.7
tzlocal 4.3
uc-micro-py 1.0.1
urllib3 1.26.15
urwid 2.1.2
uvicorn 0.21.1
validators 0.20.0
watchdog 3.0.0
wcwidth 0.2.6
webdataset 0.2.5
websockets 10.4
Werkzeug 2.2.3
wheel 0.38.4
xxhash 3.2.0
yarl 1.8.2
zipp 3.15.0
How big is the input image? Have you try different images with smaller size? I just pushed a small fix. Could you pull and try again?
Input image is 50 kB, 128x128 pixels, I think below that becomes non-sense. It works now, with ~16.5 GiB VRAM usage, though I do not understand why these small changes made the difference!? Essentially you just add some thumbnails? Nevertheless, it works now, thanks a lot!
thumbnail is a pillow function to resize image before passing through the pipeline but I don't quite understand how 128x128 pixels could crash it as the thumbnail function upsamples it to 1536x1536. Anyways, glad it works!
I had some trouble to debug it, but it seems like an OutOfMemory error, as 24.2 GB are filled via 'nvidia-smi', leading to a failure: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED or with cuDNN disabled: cuBLAS error: CUBLAS_STATUS_NOT_INITIALIZED in the F.conv2D call inside .
If you got it running on a RTX3090, could you share the configuration changes? I used a 128x128 RBG image as a smallest test.