Open cocktailpeanut opened 5 days ago
I am running the provided inference script on a 48GB A6000, it's using about 44GB with the default settings and takes around 3 minutes and 45 seconds to generate an 8fps 768x423 video. You have to modify line 198 to device = "cuda"
as noted in another issue here. I have only tried it on this gear so not sure about CUDA vs. MPS.
44GB
So.... I guess I should give up if the goal is to run it on a PC? I mean, even the best available PC is 24GB VRAM (4090). Would appreciate if someone could confirm
I am running the provided inference script on a 48GB A6000, it's using about 44GB with the default settings and takes around 3 minutes and 45 seconds to generate an 8fps 768x423 video. You have to modify line 198 to
device = "cuda"
as noted in another issue here. I have only tried it on this gear so not sure about CUDA vs. MPS.
I was also hopping we could get this running on local 24GB cards, well maybe at least smaller resolution could be possible due to exponential scaling...
Here's some results from my testing today:
NVIDIA RTX A6000, Driver 550.107.02, CUDA 12.4
20 frames @ 432x240 = 39094MiB, 50it in 0:49
40 frames @ 432x240 = 40502MiB, 50it in 1:35
40 frames @ 480x288 = 41102MiB, 50it in 1:54
40 frames @ 624x352 = 42480MiB, 50it in 2:41
40 frames @ 768x432 = 44420MiB, 50it in 3:45
60 frames @ 768x432 = 47776MiB, 50it in 5:35
80 frames went OOM
It looks like those with < 48GB are out of luck even at lower resolution and number of frames.
Here's some results from my testing today:
NVIDIA RTX A6000, Driver 550.107.02, CUDA 12.4 20 frames @ 432x240 = 39094MiB, 50it in 0:49 40 frames @ 432x240 = 40502MiB, 50it in 1:35 40 frames @ 480x288 = 41102MiB, 50it in 1:54 40 frames @ 624x352 = 42480MiB, 50it in 2:41 40 frames @ 768x432 = 44420MiB, 50it in 3:45 60 frames @ 768x432 = 47776MiB, 50it in 5:35 80 frames went OOM
It looks like those with < 48GB are out of luck even at lower resolution and number of frames.
Hopefully the new CogVideoX Image-to-Video will fit into 24 GB https://github.com/huggingface/diffusers/releases/tag/v0.30.3 their 5b img2vid model does...
How much minimum VRAM is required to run this?
Also, is this for CUDA only? Can it run on MPS?