tips on dataset volume for fine-tuning

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

MIT License

11.47k stars 1.02k forks source link

tips on dataset volume for fine-tuning #500

Open junsukha opened 1 day ago

junsukha commented 1 day ago

Hi!

Do you have any suggestions on the dataset volume needed for fine-tuning? My purpose is to generate videos of specifically dynamic car movements such as car driving on the road.

How many hours of car videos do you suggest to fine-tune with?

Appreciate sharing the work.

LinB203 commented 1 day ago

High aesthetic, no watermark, high resolution,more motion video are prefered. Qwen2-vl is a suitable captioner. I'd conservatively estimate that about thousands of video clips would fine-tune a video generated model.

junsukha commented 21 hours ago

@LinB203 thx! what length are you assuming for videos? 93 frames as you trained at the last training phase in v1.3?

LinB203 commented 17 hours ago

@LinB203 thx! what length are you assuming for videos? 93 frames as you trained at the last training phase in v1.3?

Yes.