Out of memory on 24 GB VRAM GPU

kpoeppel commented 1 year ago

I had some trouble to debug it, but it seems like an OutOfMemory error, as 24.2 GB are filled via 'nvidia-smi', leading to a failure: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED or with cuDNN disabled: cuBLAS error: CUBLAS_STATUS_NOT_INITIALIZED in the F.conv2D call inside .

If you got it running on a RTX3090, could you share the configuration changes? I used a 128x128 RBG image as a smallest test.

ruoshiliu commented 1 year ago

Hi @kpoeppel , which command/script are you running?

kpoeppel commented 1 year ago

I ran the gradio_new.py web-interface as suggested in README.md, as well as a adapted version to call the same functionality from cli for debugging. Also removed (for testing) the NSFW filter, to see if that lowers the memory footprint by enough to get it running - it didn't.

ruoshiliu commented 1 year ago

I just ran the same command and it takes around 18 GB for 4 samples. When does the OOM error happen? Is it before or after the local / public URL appears in the CLI? Would be helpful if you can include a complete screenshot or the command and error message. Also which OS are you running on? We only tested on linux, for windows, you can refer to this issue: https://github.com/cvlab-columbia/zero123/issues/8#issuecomment-1479838642

kpoeppel commented 1 year ago

Here my configuration, logs and screenshots (all at the state after running the "View from the Left" in the gradio webinterface):

$ python gradio_new.py
sys.argv:
['gradio_new.py']
Instantiating LatentDiffusion...
Loading model from 105000.ckpt
Global Step: 105000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Instantiating Carvekit HiInterface...
Instantiating StableDiffusionSafetyChecker...
Instantiating AutoFeatureExtractor...
/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/blocks.py:1381: DeprecationWarning: The `enable_queue` parameter has been deprecated. Please use the `.queue()` method instead.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://826cc8293abf6723c5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/blocks.py", line 1059, in process_api
    result = await self.call_function(
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/gradio/blocks.py", line 868, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/korbip/Programming/External/zero123/zero123/gradio_new.py", line 324, in main_run
    (image, has_nsfw_concept) = models['nsfw'](
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/safety_checker.py", line 52, in forward
    pooled_output = self.vision_model(clip_input)[1]  # pooled_output
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 843, in forward
    return self.vision_model(
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 774, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 133, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/korbip/.miniconda3/envs/zero123/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

$ nvidia-smi
Wed Mar 22 17:51:20 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:2B:00.0 Off |                  Off |
|  0%   28C    P8    26W / 450W |  24254MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1219      G   /usr/lib/xorg/Xorg                 16MiB |
|    0   N/A  N/A      3188      C   python                          24233MiB |
+-----------------------------------------------------------------------------+

$ pip list
Package                  Version     Editable project location
------------------------ ----------- -------------------------------------------------------------
absl-py                  1.4.0
aiofiles                 23.1.0
aiohttp                  3.8.4
aiosignal                1.3.1
albumentations           0.4.3
altair                   4.2.2
antlr4-python3-runtime   4.8
anyio                    3.6.2
appdirs                  1.4.4
asttokens                2.2.1
async-timeout            4.0.2
attrs                    22.2.0
backcall                 0.2.0
blinker                  1.5
braceexpand              0.1.7
cachetools               5.3.0
carvekit-colab           4.1.0
certifi                  2022.12.7
charset-normalizer       3.1.0
click                    8.1.3
clip                     1.0         /home/korbip/Programming/External/zero123/CLIP
cmake                    3.26.0
contourpy                1.0.7
cuda-python              12.1.0
cycler                   0.11.0
Cython                   0.29.33
datasets                 2.4.0
decorator                5.1.1
diffusers                0.12.1
dill                     0.3.5.1
easydict                 1.10
einops                   0.3.0
entrypoints              0.4
exceptiongroup           1.1.1
executing                1.2.0
fastapi                  0.95.0
fastcore                 1.5.28
ffmpy                    0.3.0
filelock                 3.10.0
fire                     0.4.0
fonttools                4.39.2
frozenlist               1.3.3
fsspec                   2023.3.0
ftfy                     6.1.1
future                   0.18.3
gitdb                    4.0.10
GitPython                3.1.31
google-auth              2.16.2
google-auth-oauthlib     0.4.6
gradio                   3.21.0
grpcio                   1.51.3
h11                      0.14.0
httpcore                 0.16.3
httpx                    0.23.3
huggingface-hub          0.13.3
idna                     3.4
imageio                  2.9.0
imageio-ffmpeg           0.4.2
imgaug                   0.2.6
importlib-metadata       6.1.0
importlib-resources      5.12.0
iniconfig                2.0.0
ipython                  8.11.0
jedi                     0.18.2
Jinja2                   3.1.2
jsonschema               4.17.3
kiwisolver               1.4.4
kornia                   0.6.0
lazy_loader              0.1
linkify-it-py            2.0.0
lit                      16.0.0
llvmlite                 0.39.1
loguru                   0.6.0
lovely-numpy             0.2.8
lovely-tensors           0.1.14
Mako                     1.2.4
Markdown                 3.4.1
markdown-it-py           2.2.0
MarkupSafe               2.1.2
matplotlib               3.7.1
matplotlib-inline        0.1.6
mdit-py-plugins          0.3.3
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.0.4
multiprocess             0.70.13
networkx                 3.0
numba                    0.56.4
numpy                    1.23.5
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
oauthlib                 3.2.2
omegaconf                2.1.1
opencv-python            4.5.5.64
opencv-python-headless   4.7.0.72
orjson                   3.8.8
packaging                23.0
pandas                   1.5.3
parso                    0.8.3
pexpect                  4.8.0
pickleshare              0.7.5
Pillow                   9.4.0
pip                      23.0.1
platformdirs             3.1.1
plotly                   5.13.1
pluggy                   1.0.0
point-cloud-utils        0.29.1
prompt-toolkit           3.0.38
protobuf                 3.20.3
psutil                   5.9.4
ptyprocess               0.7.0
pudb                     2019.2
pure-eval                0.2.2
pyarrow                  11.0.0
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pydantic                 1.10.6
pydeck                   0.8.0
pyDeprecate              0.3.1
pydub                    0.25.1
Pygments                 2.14.0
PyMCubes                 0.1.4
Pympler                  1.0.1
pyparsing                3.0.9
pyrsistent               0.19.3
pytest                   7.2.2
python-dateutil          2.8.2
python-multipart         0.0.6
pytools                  2022.1.14
pytorch-lightning        1.4.2
pytz                     2022.7.1
pytz-deprecation-shim    0.1.0.post0
PyWavelets               1.4.1
PyYAML                   6.0
regex                    2022.10.31
requests                 2.28.2
requests-oauthlib        1.3.1
responses                0.18.0
rfc3986                  1.5.0
rich                     13.3.2
rsa                      4.9
scikit-image             0.20.0
scipy                    1.9.1
semver                   2.13.0
setuptools               65.6.3
six                      1.16.0
smmap                    5.0.0
sniffio                  1.3.0
stack-data               0.6.2
starlette                0.26.1
streamlit                1.20.0
sympy                    1.11.1
tabulate                 0.9.0
taming-transformers      0.0.1       /home/korbip/Programming/External/zero123/taming-transformers
tenacity                 8.2.2
tensorboard              2.12.0
tensorboard-data-server  0.7.0
tensorboard-plugin-wit   1.8.1
termcolor                2.2.0
test-tube                0.7.5
tifffile                 2023.3.15
tokenizers               0.12.1
toml                     0.10.2
tomli                    2.0.1
toolz                    0.12.0
torch                    2.0.0
torch-fidelity           0.3.0
torchmetrics             0.6.0
torchvision              0.15.1
tornado                  6.2
tqdm                     4.65.0
traitlets                5.9.0
transformers             4.22.2
triton                   2.0.0
typing_extensions        4.5.0
tzdata                   2022.7
tzlocal                  4.3
uc-micro-py              1.0.1
urllib3                  1.26.15
urwid                    2.1.2
uvicorn                  0.21.1
validators               0.20.0
watchdog                 3.0.0
wcwidth                  0.2.6
webdataset               0.2.5
websockets               10.4
Werkzeug                 2.2.3
wheel                    0.38.4
xxhash                   3.2.0
yarl                     1.8.2
zipp                     3.15.0

Screenshot at 2023-03-23 00-37-44

ruoshiliu commented 1 year ago

How big is the input image? Have you try different images with smaller size? I just pushed a small fix. Could you pull and try again?

kpoeppel commented 1 year ago

Input image is 50 kB, 128x128 pixels, I think below that becomes non-sense. It works now, with ~16.5 GiB VRAM usage, though I do not understand why these small changes made the difference!? Essentially you just add some thumbnails? Nevertheless, it works now, thanks a lot!

ruoshiliu commented 1 year ago

thumbnail is a pillow function to resize image before passing through the pipeline but I don't quite understand how 128x128 pixels could crash it as the thumbnail function upsamples it to 1536x1536. Anyways, glad it works!

cvlab-columbia / zero123

Out of memory on 24 GB VRAM GPU #12