PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.32k stars 1.01k forks source link

can I duplicate the huggingface-space and run on paid-GPU there? #193

Open minounou opened 6 months ago

minounou commented 6 months ago

Hi,

can I duplicate the huggingface-space and run on paid-GPU there (I tried on my local nvidia-4090(24GB) and got "cuda memory error)? Thanks! https://huggingface.co/spaces/LanguageBind/Open-Sora-Plan-v1.0.0

LinB203 commented 6 months ago

It takes about 40G.

minounou commented 6 months ago

OK Thanks! also is "Nvidia 4xA10G large 96GB-vram" OK? I tried that before but still got "cuda out memory" error-message: it seems it needs a single card with more than 40G? (A100-40G of huggingface is not available this morning so I cannot start space on that, H100-80G is also not available yet):

  File "/home/user/app/opensora/models/ae/videobase/modules/resnet_block.py", line 75, in forward
    h = nonlinearity(h)
  File "/home/user/app/opensora/models/ae/videobase/modules/ops.py", line 15, in nonlinearity
    return x * torch.sigmoid(x)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.03 GiB. GPU 0 has a total capacty of 22.19 GiB of which 197.50 MiB is free. Process 256775 has 21.99 GiB memory in use. Of the allocated memory 19.30 GiB is allocated by PyTorch, and 2.38 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
LinB203 commented 6 months ago

The support for splitting on multi-gpu is wip. https://github.com/huggingface/diffusers/pull/6396/ So it can't split models with device_map='auto' like transformers.

ustcxmwu commented 5 months ago

The support for splitting on multi-gpu is wip. huggingface/diffusers#6396 So it can't split models with device_map='auto' like transformers.

could you provice more detail information about multi GPU train or inference ?