PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.54k stars 1.03k forks source link

What's HW requirement to run this model? #38

Open marvin-0042 opened 8 months ago

marvin-0042 commented 8 months ago

I tried A100 (40GB SXM4) with 30 vCPUs, 200 GiB RAM, 512 GiB SSD but immediately CUDA out of memory.

which card / config shall i use? 8x A100 80GB? 1x H100 80GB? 8x H100 80GB?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 538.00 MiB (GPU 0; 39.39 GiB total capacity; 37.39 GiB already allocated; 233.94 MiB free; 38.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation

(opensora) ubuntu@129-146-126-183:~/opensora-arizona/Open-Sora-Plan$ python ./src/sora/modules/ae/vqvae/videogpt/rec_video.py --video-path "assets/origin_video_0.mp4" --rec-path "rec_video_0.mp4" --num-frames 500 --sample-rate 1 /home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( Downloading... From (original): https://drive.google.com/uc?id=1uuB_8WzHP_bbBmfuaIV7PK_Itl3DyHY5 From (redirected): https://drive.google.com/uc?id=1uuB_8WzHP_bbBmfuaIV7PK_Itl3DyHY5&confirm=t&uuid=edea95d1-1e18-41c1-8b57-966749fb41ad To: /home/ubuntu/opensora-arizona/Open-Sora-Plan/ucf101_stride4x4x4 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 258M/258M [00:05<00:00, 45.4MB/s] sample_frames_len 500, only can sample 300 assets/origin_video_0.mp4 300 Traceback (most recent call last): File "./src/sora/modules/ae/vqvae/videogpt/rec_video.py", line 110, in main(args) File "./src/sora/modules/ae/vqvae/videogpt/rec_video.py", line 92, in main encodings, embeddings = vqvae.encode(x_vae, include_embeddings=True) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/vqvae.py", line 38, in encode h = self.pre_vq_conv(self.encoder(x)) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/vqvae.py", line 241, in forward h = self.res_stack(h) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/vqvae.py", line 125, in forward return x + self.block(x) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/vqvae.py", line 104, in forward x = self.attn_w(x, x, x) + self.attn_h(x, x, x) + self.attn_t(x, x, x) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/attention.py", line 193, in forward a = self.attn(q, k, v, decode_step, decode_idx) File "/home/ubuntu/opensora-arizona/miniconda3/envs/opensora/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/attention.py", line 244, in forward out = scaled_dot_product_attention(q, k, v, training=self.training) File "/home/ubuntu/opensora-arizona/Open-Sora-Plan/src/sora/modules/ae/vqvae/videogpt/videogpt/attention.py", line 500, in scaled_dot_product_attention attn = torch.matmul(q, k.transpose(-1, -2)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 538.00 MiB (GPU 0; 39.39 GiB total capacity; 37.39 GiB already allocated; 233.94 MiB free; 38.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

luo3300612 commented 8 months ago

Try reconstruct fewer frames, or use smaller resolution like --num-frames 100, --resolution 196