[SVD] CUDA out of memory on TESLA V100 32G ?

kaijingxjtu commented 10 months ago

I use default "Decode 14 frames at a time", but it sames OOM at the last step of sampling. But, I've seen few examples of successful sampling using Gpus with 24GB or smaller memory. Why is that?

`Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.

2023-11-24 13:24:33.363 Did not auto detect external IP. Please go to https://docs.streamlit.io/ for debugging hints.

You can now view your Streamlit app in your browser.

Network URL: http://xxx.xxx.xxx.xxx:80

No SDP backend available, likely because you are running in pytorch versions < 2.0. In fact, you are using PyTorch 1.13.1+cu117. You might want to consider upgrading. VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False Initialized embedder #1: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder #3: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False Loading model from /root/makaijing/generative-models-main/stable-video-diffusion-img2vid/svd.safetensors 2023-11-24 13:25:24.965 Uncaught app exception Traceback (most recent call last): File "/root/miniconda3/envs/vid_gen/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script exec(code, module.dict) File "/root/makaijing/generative-models-main/scripts/demo/video_sampling.py", line 142, in value_dict["cond_frames"] = img + cond_aug torch.randn_like(img) TypeError: randn_like(): argument 'input' (position 1) must be Tensor, not NoneType Global seed set to 23 Global seed set to 23 Global seed set to 23 ############################## Sampling setting ############################## Sampler: EulerEDMSampler Discretization: EDMDiscretization Guider: LinearPredictionGuider Sampling with EulerEDMSampler for 26 steps: 0%| | 0/26 [00:00<?, ?it/s]/root/miniconda3/envs/vid_gen/lib/python3.8/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") Sampling with EulerEDMSampler for 26 steps: 96%|████████████████████████████████████████████████████▉ | 25/26 [01:02<00:02, 2.48s/it] 2023-11-24 13:27:05.104 Uncaught app exception Traceback (most recent call last): File "/root/miniconda3/envs/vid_gen/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script exec(code, module.dict) File "/root/makaijing/generative-models-main/scripts/demo/video_sampling.py", line 174, in out = do_sample( File "/root/makaijing/generative-models-main/./scripts/demo/streamlit_helpers.py", line 616, in do_sample samples_x = model.decode_first_stage(samples_z) File "/root/miniconda3/envs/vid_gen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/root/makaijing/generative-models-main/./sgm/models/diffusion.py", line 130, in decode_first_stage out = self.first_stage_model.decode( File "/root/makaijing/generative-models-main/./sgm/models/autoencoder.py", line 211, in decode x = self.decoder(z, kwargs) File "/root/miniconda3/envs/vid_gen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/root/makaijing/generative-models-main/./sgm/modules/diffusionmodules/model.py", line 733, in forward h = self.up[i_level].block[i_block](h, temb, kwargs) File "/root/miniconda3/envs/vid_gen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, *kwargs) File "/root/makaijing/generative-models-main/./sgm/modules/autoencoding/temporal_ae.py", line 70, in forward x = super().forward(x, temb) File "/root/makaijing/generative-models-main/./sgm/modules/diffusionmodules/model.py", line 134, in forward h = nonlinearity(h) File "/root/makaijing/generative-models-main/./sgm/modules/diffusionmodules/model.py", line 49, in nonlinearity return x torch.sigmoid(x) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 31.75 GiB total capacity; 23.74 GiB already allocated; 3.19 GiB free; 27.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

pip list `Package Version

absl-py 1.1.0 addict 2.4.0 aiohttp 3.9.0 aiosignal 1.3.1 altair 5.1.2 antlr4-python3-runtime 4.9.3 anyio 3.6.1 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1 asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.3 atomicwrites 1.4.0 attrs 23.1.0 backcall 0.2.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.11.1 black 23.7.0 bleach 5.0.0 blinker 1.7.0 braceexpand 0.1.7 cachetools 5.3.2 certifi 2023.11.17 cffi 1.15.0 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 clip 1.0 cloudpickle 2.1.0 cmake 3.27.7 colorama 0.4.4 commonmark 0.9.1 contextlib2 21.6.0 contourpy 1.1.1 cryptography 37.0.2 cycler 0.12.1 debugpy 1.6.0 decorator 5.1.1 defusedxml 0.7.1 distlib 0.3.4 docker-pycreds 0.4.0 easydict 1.9 editables 0.3 einops 0.7.0 entrypoints 0.4 executing 0.8.3 fairscale 0.4.13 fastjsonschema 2.15.3 filelock 3.13.1 fire 0.5.0 flatbuffers 1.12 fonttools 4.45.0 frozenlist 1.4.0 fsspec 2023.10.0 ftfy 6.1.3 future 0.18.2 gast 0.4.0 gitdb 4.0.11 GitPython 3.1.40 google-auth 2.8.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.46.3 h11 0.12.0 h5py 3.4.0 hatch 1.2.1 hatchling 1.3.1 httpcore 0.15.0 httpx 0.23.0 huggingface-hub 0.19.4 hyperopt 0.1.2 idna 3.4 importlib-metadata 6.8.0 importlib-resources 6.1.1 intel-openmp 2022.1.0 invisible-watermark 0.2.0 ipdb 0.13.9 ipykernel 6.15.0 ipython 8.4.0 ipython-genutils 0.2.0 ipywidgets 7.7.0 jedi 0.19.1 jeepney 0.8.0 Jinja2 3.1.2 joblib 1.1.0 json-tricks 3.15.5 jsonschema 4.20.0 jsonschema-specifications 2023.11.1 jupyter-client 7.3.4 jupyter-core 4.10.0 jupyterlab-pygments 0.2.2 jupyterlab-widgets 1.1.0 keras 2.9.0 Keras-Preprocessing 1.1.2 keyring 23.6.0 kiwisolver 1.4.5 kornia 0.6.9 libarchive-c 2.8 libclang 14.0.1 lightning-utilities 0.10.0 lit 17.0.5 lmdb 1.3.0 Markdown 3.3.7 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.7.4 matplotlib-inline 0.1.3 mdurl 0.1.2 mistune 0.8.4 mkl 2022.1.0 multidict 6.0.4 mypy-extensions 1.0.0 natsort 8.4.0 nbclient 0.6.4 nbconvert 6.5.0 nbformat 5.4.0 nest-asyncio 1.5.5 networkx 2.8.4 nltk 3.7 nni 2.7 notebook 6.4.12 numpy 1.24.4 oauthlib 3.2.0 omegaconf 2.3.0 onnx 1.12.0 onnxruntime 1.11.1 open-clip-torch 2.23.0 opencv-python 4.6.0.66 opt-einsum 3.3.0 packaging 23.2 pandas 2.0.3 pandocfilters 1.5.0 parso 0.8.3 pathspec 0.11.2 pathtools 0.1.2 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.1.1 pip 22.1.2 pkgutil_resolve_name 1.3.10 platformdirs 2.5.2 pluggy 1.0.0 portalocker 2.8.2 prettytable 3.3.0 prometheus-client 0.14.1 promise 2.3 prompt-toolkit 3.0.29 protobuf 3.20.1 psutil 5.9.6 ptyprocess 0.7.0 pudb 2023.1 pure-eval 0.2.2 pyarrow 8.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycocoevalcap 1.2 pycocotools 2.0.4 pycparser 2.21 pydeck 0.8.1b0 pyDeprecate 0.3.2 Pygments 2.17.2 pymongo 4.1.1 Pympler 1.0.1 pyOpenSSL 19.0.0 pyparsing 3.1.1 pyperclip 1.8.2 pyre-extensions 0.0.23 pyrsistent 0.18.1 PySocks 1.7.1 python-dateutil 2.8.2 PythonWebHDFS 0.2.3 pytorch-lightning 1.8.5 pytz 2023.3.post1 pytz-deprecation-shim 0.1.0.post0 PyWavelets 1.4.1 PyYAML 6.0.1 pyzmq 23.1.0 rater 0.1.1 referencing 0.31.0 regex 2023.10.3 requests 2.31.0 requests-oauthlib 1.3.1 responses 0.21.0 rfc3986 1.5.0 rich 13.7.0 rpds-py 0.13.1 rsa 4.8 sacremoses 0.0.53 safetensors 0.4.0 schema 0.7.5 scikit-learn 0.24.2 scikit-video 1.1.11 scipy 1.10.1 SecretStorage 3.3.2 semver 2.13.0 Send2Trash 1.8.0 sentencepiece 0.1.99 sentry-sdk 1.36.0 setproctitle 1.3.3 setuptools 69.0.2 shortuuid 1.0.9 simplejson 3.17.6 six 1.16.0 smmap 5.0.1 sniffio 1.2.0 soupsieve 2.3.2.post1 stack-data 0.3.0 streamlit 1.28.2 tabulate 0.8.9 tbb 2021.6.0 tenacity 8.2.3 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorboardX 2.5.1 termcolor 2.3.0 terminado 0.15.0 terminaltables 3.1.10 threadpoolctl 3.1.0 timm 0.9.11 tinycss2 1.1.1 tokenizers 0.12.1 toml 0.10.2 tomli 2.0.1 tomli_w 1.0.0 tomlkit 0.11.0 toolz 0.12.0 torch 1.13.1+cu117 torch-tb-profiler 0.4.0 torchaudio 0.13.1+cu117 torchdata 0.5.1 torchmetrics 1.2.0 torchtext 0.11.0 torchvision 0.14.1+cu117 tornado 6.3.3 tqdm 4.66.1 traitlets 5.3.0 transformers 4.19.1 triton 2.0.0.post1 typeguard 2.13.3 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 tzlocal 5.2 urllib3 1.26.18 urwid 2.2.3 urwid-readline 0.13 userpath 1.8.0 validators 0.22.0 virtualenv 20.14.1 wandb 0.16.0 watchdog 3.0.0 wcwidth 0.2.12 webdataset 0.2.77 webencodings 0.5.1 websockets 10.3 Werkzeug 2.1.2 wheel 0.41.3 widgetsnbextension 3.6.0 wrapt 1.14.1 xformers 0.0.16 yapf 0.32.0 yarl 1.9.3 zipp 3.17.0`

ffmpeg -version

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04) configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared libavutil 55. 78.100 / 55. 78.100 libavcodec 57.107.100 / 57.107.100 libavformat 57. 83.100 / 57. 83.100 libavdevice 57. 10.100 / 57. 10.100 libavfilter 6.107.100 / 6.107.100 libavresample 3. 7. 0 / 3. 7. 0 libswscale 4. 8.100 / 4. 8.100 libswresample 2. 9.100 / 2. 9.100 libpostproc 54. 7.100 / 54. 7.100

zhanghongyong123456 commented 10 months ago

you can set :

badayvedat commented 10 months ago

But, I've seen few examples of successful sampling using Gpus with 24GB or smaller memory. Why is that?

They use smaller lower decoding frames. I would suggest you start with 14 and decrease it incrementally until you generate a video.

SunzeY commented 10 months ago

@zhanghongyong123456 this doesn't work for me...still OOM, can you share more about you GPU?

Laidawang commented 10 months ago

reduce decoding_t to 1, When you decode more than a dozen frames at the same time, it will explode

SunzeY commented 10 months ago

I believe ComfyUI is a better choice for GPU with memory less than 80G, as it use xformer to reduce memory cost.

12441409 commented 10 months ago

me too

zhanghongyong123456 commented 10 months ago

@zhanghongyong123456 this doesn't work for me...still OOM, can you share more about you GPU?

I test on RTX 3090 (24G) and RTX 8000 (48G)

DaBaiTuu commented 8 months ago

I saw someone uploaded svd-f16.safetentors version,though it is not offical,maybe you can try

DaBaiTuu commented 8 months ago

And To set lowvram = True will really help,but just for scripts which import the streamlit_helpers.py, unfortunately SVD-series are not included

RuojiWang commented 6 months ago

out of memory with 40G A100 GPU

Stability-AI / generative-models

[SVD] CUDA out of memory on TESLA V100 32G ? #180