Open Dumeowmeow opened 7 months ago
Hi @jufeif, thanks for your attention. We use SVD in our work, and other image-to-video models (e.g., AnimateDiff) are also compatible with UniEdit.
As introduced in section 4.4 in the main text, to achieve text-image-to-video (TI2V) generation, we:
- Convert an image into video with 1), coherent data augmentations or 2), image-to-video (I2V) models.
- Obtain the output video by performing text-guided editing with UniEdit on the vanilla video.
Thank you for your reply, but as far as I know, the scheduler used in SVD is continuous, while DDIM Inversion is discrete. The effect of video Inversion directly using DDIM Inversion is not good, may I ask how do you do it?
Hi @jufeif, note that we perform the DDIM inversion on the SVD synthesized video with the T2V model LaVie, and you could change the beta scheduler of DDIM for accurate reconstruction.
Hi @jufeif, note that we perform the DDIM inversion on the SVD synthesized video with the T2V model LaVie, and you could change the beta scheduler of DDIM for accurate reconstruction.
Thank you for your answer.I'm sorry I have another question. I did not find the explanation of these two grey arrows in the paper. What do they mean?
Hi @jufeif, they are used for spatial structure control. Please refer to the paragraph 'Spatial Structure Control on SA-S Modules' in Section 4.2 in the main text.
Hi @jufeif, thanks for your attention. We use SVD in our work, and other image-to-video models (e.g., AnimateDiff) are also compatible with UniEdit.
As introduced in section 4.4 in the main text, to achieve text-image-to-video (TI2V) generation, we: