Running CogVideoX-5B on T4/V100 Free Colab Space

ProKSMT commented 2 weeks ago

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB.

V100 32G

5B model, enable_model_cpu_offload() option and pipe.vae.enable_tiling() optimization were enabled

using diffusers (cli_demo.py)

zRzRzRzRzRzRzR commented 2 weeks ago

update diffusers to 0.30.1

ProKSMT commented 2 weeks ago

I am using diffusers 0.30.1

zRzRzRzRzRzRzR commented 2 weeks ago

can you try the code in cogvideox-devbreanch with it requirement and try again with cli_demo.py also, use breakpoint() to locate the OOM code line,thks

ProKSMT commented 2 weeks ago

I don‘t know how to use that to locate the OOM code line. Maybe this log will be helpful.

And I will test the cogvideox-dev branch as soon as possible.

GuanleiGao commented 2 weeks ago

I had the same problem with V100, and it was solved by switching to A10. It seems to be a graphics card problem

ProKSMT commented 2 weeks ago

I think so too. I found that V100 does not support bf16. I switched the dtype to fp16 and it worked (main branch). So I think it might not be necessary to test on the dev branch. However, I don't know exactly how the V100 leads to OOM just because it doesn't support bf16. Maybe the auto type conversion make the VRAM consumption multiply I guess.

Exploder98 commented 2 weeks ago

I'm seeing this on my AMD RX 6900 XT. Changing the dtype does not have any effect, though. Could this have something to do with Flash Attention or Memory efficient attention support? I know that on my GPU neither of those work.

zRzRzRzRzRzRzR commented 2 weeks ago

I think we need to try this issue. The 3060 desktop version has only 12G, but it can run the 5B model normally. However, there is feedback from developers that the V100 32G has problems running the 5B model, while the 2B model runs normally. I will check if it is a precision issue.

ProKSMT commented 2 weeks ago

OK, I just tested it on the dev branch, and the same issue occurred. It also shows as 56.50G

zRzRzRzRzRzRzR commented 2 weeks ago

Check if several key positions are open

Do not attempt to enable online quantization, this may cause errors on the GPU in this architecture.

Try to check several key nodes

pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.float16)
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b"
text_encoder=text_encoder,
transformer=transformer
woe=woe,
torch_dtype=torch.float16
)

You must use FP16 on T4 unless you are using a GPU with Ampere or higher architecture that supports BF16 Additionally, do not use .to(device), as this allows for better compression on the CPU and memory, rather than transferring the entire complete model to the GPU.

Finally, check whether these four memory-saving schemes are enabled

pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

I am already running normally on T4 on colab

Please check if this can help you

ProKSMT commented 2 weeks ago

So it seems that the V100 can not run in BF16 mode. But It looks like that FP16 mode is not as good as BF16.

Will you release a special FP16 version of the 5B model ?

zRzRzRzRzRzRzR commented 2 weeks ago

So it seems that the V100 can not run in BF16 mode. But It looks like that FP16 mode is not as good as BF16.

Will you release a special FP16 version of the 5B model ?

We tried, but the results weren’t ideal. The 5B model is currently recommended to run at BF16 precision, which is also the precision we used for training. Converting to FP16 leads to suboptimal performance. However, the 2B model has lower compatibility requirements and can run effectively in FP16.