Open shim0114 opened 2 weeks ago
Not as condition, when you want to generate a video, you should make a fixed resolution noise first that the model know what shape to generate. https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/scripts/text_condition/gpu/sample_t2v_v1_3.sh#L6-L8 In training, we use customed dataloader sampler to implement it. https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/utils/dataset_utils.py#L327
Thank you for your response. I realize that my previous question might not have been entirely clear. According to the documentation (link), it seems that during training, the model learns using various frame conditionings. Given this, I was wondering if it's possible to use multiple arbitrary frames as conditioning inputs during the generation process—not just the first frame (as in i2v) or the last frame (as in interpolation).
Could you please advise if this functionality is available or how it might be implemented?
I appreciate your help and look forward to your guidance.
Thank you for your fantastic work on Open-sora—v1.3!
I have a question regarding conditional inputs. Specifically, is it possible to use multiple arbitrary frames as conditions to guide the generation process? While exploring the codebase, I was able to find implementations related to t2v, i2v, and inpaint here, but I couldn’t locate anything that explicitly supports conditioning on multiple arbitrary frames.
Thanks a lot in advance for your help!