Vchitect / Vchitect-2.0

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
https://vchitect.intern-ai.org.cn/
Apache License 2.0
534 stars 13 forks source link

System Requirement? #4

Open cocktailpeanut opened 5 days ago

cocktailpeanut commented 5 days ago

How much minimum VRAM is required to run this?

Also, is this for CUDA only? Can it run on MPS?

kyleeasterly commented 5 days ago

I am running the provided inference script on a 48GB A6000, it's using about 44GB with the default settings and takes around 3 minutes and 45 seconds to generate an 8fps 768x423 video. You have to modify line 198 to device = "cuda" as noted in another issue here. I have only tried it on this gear so not sure about CUDA vs. MPS.

cocktailpeanut commented 5 days ago

44GB

So.... I guess I should give up if the goal is to run it on a PC? I mean, even the best available PC is 24GB VRAM (4090). Would appreciate if someone could confirm

C00reNUT commented 5 days ago

I am running the provided inference script on a 48GB A6000, it's using about 44GB with the default settings and takes around 3 minutes and 45 seconds to generate an 8fps 768x423 video. You have to modify line 198 to device = "cuda" as noted in another issue here. I have only tried it on this gear so not sure about CUDA vs. MPS.

I was also hopping we could get this running on local 24GB cards, well maybe at least smaller resolution could be possible due to exponential scaling...

kyleeasterly commented 5 days ago

Here's some results from my testing today:

NVIDIA RTX A6000, Driver 550.107.02, CUDA 12.4

20 frames @ 432x240 = 39094MiB, 50it in 0:49

40 frames @ 432x240 = 40502MiB, 50it in 1:35
40 frames @ 480x288 = 41102MiB, 50it in 1:54
40 frames @ 624x352 = 42480MiB, 50it in 2:41
40 frames @ 768x432 = 44420MiB, 50it in 3:45

60 frames @ 768x432 = 47776MiB, 50it in 5:35

80 frames went OOM

It looks like those with < 48GB are out of luck even at lower resolution and number of frames.

C00reNUT commented 5 days ago

Here's some results from my testing today:

NVIDIA RTX A6000, Driver 550.107.02, CUDA 12.4

20 frames @ 432x240 = 39094MiB, 50it in 0:49

40 frames @ 432x240 = 40502MiB, 50it in 1:35
40 frames @ 480x288 = 41102MiB, 50it in 1:54
40 frames @ 624x352 = 42480MiB, 50it in 2:41
40 frames @ 768x432 = 44420MiB, 50it in 3:45

60 frames @ 768x432 = 47776MiB, 50it in 5:35

80 frames went OOM

It looks like those with < 48GB are out of luck even at lower resolution and number of frames.

Hopefully the new CogVideoX Image-to-Video will fit into 24 GB https://github.com/huggingface/diffusers/releases/tag/v0.30.3 their 5b img2vid model does...