Open junsukha opened 1 day ago
High aesthetic, no watermark, high resolution,more motion video are prefered. Qwen2-vl is a suitable captioner. I'd conservatively estimate that about thousands of video clips would fine-tune a video generated model.
@LinB203 thx! what length are you assuming for videos? 93 frames as you trained at the last training phase in v1.3?
@LinB203 thx! what length are you assuming for videos? 93 frames as you trained at the last training phase in v1.3?
Yes.
Hi!
Do you have any suggestions on the dataset volume needed for fine-tuning? My purpose is to generate videos of specifically dynamic car movements such as car driving on the road.
How many hours of car videos do you suggest to fine-tune with?
Appreciate sharing the work.