Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation
MIT License
1.82k stars 74 forks source link

t2v timing #52

Open BurhanUlTayyab opened 3 weeks ago

BurhanUlTayyab commented 3 weeks ago

Hi

I've implemented Lumina-T2V model and training it on Panda dataset. The paper mentions initial training takes 8 GPUs. I assume they are 8xA100 80GBs (which I'm using). May I know how long does it take (in terms of GPU hours)?

gaopengpjlab commented 3 weeks ago

The paper claim 128 GPU is a must for T2V training.

BurhanUlTayyab commented 3 weeks ago

IMG_6929

The first stage here refers 8 GPUs. I assume they are A100s, if not please tell me. Also tell me the GPU hours, how many GPU hours have been spent on training for both stage 1 and 2

leonardodora commented 2 weeks ago

Hi

I've implemented Lumina-T2V model and training it on Panda dataset. The paper mentions initial training takes 8 GPUs. I assume they are 8xA100 80GBs (which I'm using). May I know how long does it take (in terms of GPU hours)?

hi,do you have any plan to release the t2v codes?

BurhanUlTayyab commented 1 week ago

Hi I've implemented Lumina-T2V model and training it on Panda dataset. The paper mentions initial training takes 8 GPUs. I assume they are 8xA100 80GBs (which I'm using). May I know how long does it take (in terms of GPU hours)?

hi,do you have any plan to release the t2v codes?

Here's the code for Text to video model (https://drive.google.com/file/d/1jAtojjVmpzKuafUaFZjUT_HueAaAdve3/view?usp=sharing). But we have to warn you, that 8 GPU pretraining doesn't give good results at all. I asked them and as mentioned in the paper, they pretrain on 128 GPUs, so I don't know how they are doing pretraining on 8 GPUs. This is also the reason, they are not releasing T2V model.

leonardodora commented 5 days ago

Hi I've implemented Lumina-T2V model and training it on Panda dataset. The paper mentions initial training takes 8 GPUs. I assume they are 8xA100 80GBs (which I'm using). May I know how long does it take (in terms of GPU hours)?

hi,do you have any plan to release the t2v codes?

Here's the code for Text to video model (https://drive.google.com/file/d/1jAtojjVmpzKuafUaFZjUT_HueAaAdve3/view?usp=sharing). But we have to warn you, that 8 GPU pretraining doesn't give good results at all. I asked them and as mentioned in the paper, they pretrain on 128 GPUs, so I don't know how they are doing pretraining on 8 GPUs. This is also the reason, they are not releasing T2V model.

Thanks for your code! Maybe you could use t2i as pretrained model. 8 GPUs for t2v training from scratch is so challenging!