"CUDA out of memory" when running inference.py on nvidia-4090

minounou commented 6 months ago

Hi,

when I run this command:
python inference.py --prompt="A cat running on the street"

I got the following "CUDA out of memory." error-message: I have 2 nvidia-4090 but it seems I only can use one of them: could you let me know how to change the config.yaml so I could run inference.py with nvidia-4090(24GB)? Thanks!

... Traceback (most recent call last): File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/inference.py", line 66, in stream_cli, stream_model = init_streamingt2v_model(ckpt_file_streaming_t2v, result_fol) File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/model_init.py", line 105, in init_streamingt2v_model model.load_state_dict(torch.load( File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 809, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load result = unpickler.load() File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor wrap_storage=restore_location(storage, location), File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location result = fn(storage, location) File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 187, in _cuda_deserialize return obj.cuda(device) File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/_utils.py", line 81, in _cuda untyped_storage = torch.UntypedStorage( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 23.65 GiB total capacity; 22.20 GiB already allocated; 22.06 MiB free; 22.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ...

opplus commented 6 months ago

+1

minounou commented 6 months ago

I also duplicate the following huggingface space on Nvidia 4xA10G large(96G vram) , but still got “cuda out of memory” error-message (from the message it seems I still only have about 24G vram): can I use the 96GB vram for one virtual-device? Thanks! (I am also applying to restart the space on A100(40GB) or H100(80GB) but still didn't get the confirming-replying email yet from huggingface yet). https://huggingface.co/spaces/PAIR/StreamingT2V

oldnaari commented 6 months ago

Thank you for reporting the issue. Indeed there was a probllem in codes that caused excessive memory usage, should be fixed with https://github.com/Picsart-AI-Research/StreamingT2V/pull/16. Please let us know if this doesn't fix the memory issue for you.

minounou commented 6 months ago

Hi, I pulled the new code and run the following, it seems with 1 nvidia-4090 I still got the "Cuda out memory" error-message? the log is as the following could you take a look Thanks!

python inference.py --prompt="A cat running on the street"

...
CUSTOM XFORMERS ATTENTION USED.
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
  File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/inference.py", line 68, in <module>
    stream_cli, stream_model = init_streamingt2v_model(ckpt_file_streaming_t2v, result_fol)
  File "/home/bizon/boxu/models/8-DT/22007_streamingT2V_202404/StreamingT2V/t2v_enhanced/model_init.py", line 105, in init_streamingt2v_model
    model.load_state_dict(torch.load(
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location
    result = fn(storage, location)
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
    return obj.cuda(device)
  File "/home/bizon/anaconda3/envs/st2v/lib/python3.10/site-packages/torch/_utils.py", line 81, in _cuda
    untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 23.65 GiB total capacity; 22.29 GiB already allocated; 27.56 MiB free; 22.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
...

minounou commented 6 months ago

(also my request for A100(40GB vram) is approved now, I am duplicating space now and got the following error-message): https://huggingface.co/spaces/PAIR/StreamingT2V

...
Runtime error
t/s]
It seems like you have activated model offloading by calling `enable_model_cpu_offload`, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, text_encoder, tokenizer, unet, scheduler to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: `pipeline.to('cpu')` or removing the move altogether if you use offloading.
Traceback (most recent call last):
  File "/home/user/app/app.py", line 46, in <module>
    ms_model = init_modelscope(devices[1])
  File "/home/user/app/t2v_enhanced/model_init.py", line 32, in init_modelscope
    return pipe.to(device)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 418, in to
    module.to(device, dtype)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Container logs:

===== Application Startup at 2024-04-08 18:30:25 =====

/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:24: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:29: FutureWarning: `TransformerTemporalModel` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel`, instead.
  deprecate("TransformerTemporalModel", "0.29", deprecation_message)
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py:34: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-04-08 20:50:30,774 - modelscope - INFO - PyTorch version 2.0.0 Found.
2024-04-08 20:50:30,775 - modelscope - INFO - Loading ast index from /home/user/.cache/modelscope/ast_indexer
2024-04-08 20:50:30,776 - modelscope - INFO - No valid ast index found from /home/user/.cache/modelscope/ast_indexer, generating ast index from prebuilt!
2024-04-08 20:50:30,841 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 1d90ae0ad8c8101082d2f62a2d940153 and a total number of 972 components indexed
...

chaojie commented 6 months ago

Try my ComfyUI version: https://github.com/chaojie/ComfyUI_StreamingT2V

hben35096 commented 6 months ago

python gradio_demo.py

Even with V100-32GB, I still get:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PIPE LOADING DONE
CUSTOM XFORMERS ATTENTION USED.
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
  File "/root/autodl-tmp/StreamingT2V/t2v_enhanced/gradio_demo.py", line 48, in <module>
    stream_cli, stream_model = init_streamingt2v_model(ckpt_file_streaming_t2v, result_fol)
  File "/root/autodl-tmp/StreamingT2V/t2v_enhanced/model_init.py", line 105, in init_streamingt2v_model
    model.load_state_dict(torch.load(
  File "/root/miniconda3/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/root/miniconda3/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/root/miniconda3/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/root/miniconda3/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location
    result = fn(storage, location)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/serialization.py", line 187, in _cuda_deserialize
    return obj.cuda(device)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/_utils.py", line 81, in _cuda
    untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 31.74 GiB total capacity; 30.73 GiB already allocated; 60.88 MiB free; 31.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

hpoghos commented 6 months ago

We are currently working on this issue

minounou commented 6 months ago

@chaojie Thank you! (right the small model you mentioned can work on nvidia-4090 now! Great Thanks!)

hpoghos commented 6 months ago

With the latest changes you should be able to run StreamingT2V with 24 frames on 4090, for ModelscopeT2V and AnimateDiff it should work by default for SVD just add --offload_models

minounou commented 6 months ago

@hpoghos Thank you! I just pull the newest-code and generated a 1m14s video successfully on 1 nvidia-4090, Great Thanks!

python inference.py --prompt="Beside the ancient amphitheater of Taormina, agroup of friends enjoyed a leisurely picnic." --num_frames 600

Picsart-AI-Research / StreamingT2V

"CUDA out of memory" when running inference.py on nvidia-4090 #4