Finetunning with custom dataset of multiple text-video pairs.

ali-vilab / VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

https://i2vgen-xl.github.io

2.75k stars 243 forks source link

Finetunning with custom dataset of multiple text-video pairs. #112

Open Snarky36 opened 2 months ago

Snarky36 commented 2 months ago

Hello i would like to finetune a T2V model with some custom dataset of prompts and their video. Could you help me with some adviced of what finetunning code and what model I should use for that? I have aproximatly 1300 text video pairs of sign language. I will very much appreciate if you could help me a little bit cause I don't know where to start from and how exactly. Thank you for your time!

Steven-SWZhang commented 2 months ago

Hi, you can train your model with: python train.py --cfg configs/t2v_train.yaml, but you should customize your own dataset format first.

Snarky36 commented 2 months ago

Thank you very much. Where can I look and understand how the dataset format should look like in order to customize mine?

Steven-SWZhang commented 2 months ago

Please refer to the toy dataset and example config