PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.6k stars 1.03k forks source link

Using Multiple Arbitrary Frames as Conditions in Open-sora Plan v1.3? #531

Open shim0114 opened 2 weeks ago

shim0114 commented 2 weeks ago

Thank you for your fantastic work on Open-sora—v1.3!

I have a question regarding conditional inputs. Specifically, is it possible to use multiple arbitrary frames as conditions to guide the generation process? While exploring the codebase, I was able to find implementations related to t2v, i2v, and inpaint here, but I couldn’t locate anything that explicitly supports conditioning on multiple arbitrary frames.

Thanks a lot in advance for your help!

LinB203 commented 2 weeks ago

Not as condition, when you want to generate a video, you should make a fixed resolution noise first that the model know what shape to generate. https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/scripts/text_condition/gpu/sample_t2v_v1_3.sh#L6-L8 In training, we use customed dataloader sampler to implement it. https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/utils/dataset_utils.py#L327

shim0114 commented 2 weeks ago

Thank you for your response. I realize that my previous question might not have been entirely clear. According to the documentation (link), it seems that during training, the model learns using various frame conditionings. Given this, I was wondering if it's possible to use multiple arbitrary frames as conditioning inputs during the generation process—not just the first frame (as in i2v) or the last frame (as in interpolation).

Could you please advise if this functionality is available or how it might be implemented?

I appreciate your help and look forward to your guidance.