JianhongBai / UniEdit

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
https://jianhongbai.github.io/UniEdit/
88 stars 4 forks source link

What is text-to-image-to-video based on, SVD or AnimateDiff #3

Open Dumeowmeow opened 7 months ago

JianhongBai commented 7 months ago

Hi @jufeif, thanks for your attention. We use SVD in our work, and other image-to-video models (e.g., AnimateDiff) are also compatible with UniEdit.

As introduced in section 4.4 in the main text, to achieve text-image-to-video (TI2V) generation, we:

  1. Convert an image into video with 1), coherent data augmentations or 2), image-to-video (I2V) models.
  2. Obtain the output video by performing text-guided editing with UniEdit on the vanilla video.
Dumeowmeow commented 7 months ago

Hi @jufeif, thanks for your attention. We use SVD in our work, and other image-to-video models (e.g., AnimateDiff) are also compatible with UniEdit.

As introduced in section 4.4 in the main text, to achieve text-image-to-video (TI2V) generation, we:

  1. Convert an image into video with 1), coherent data augmentations or 2), image-to-video (I2V) models.
  2. Obtain the output video by performing text-guided editing with UniEdit on the vanilla video.

Thank you for your reply, but as far as I know, the scheduler used in SVD is continuous, while DDIM Inversion is discrete. The effect of video Inversion directly using DDIM Inversion is not good, may I ask how do you do it?

JianhongBai commented 7 months ago

Hi @jufeif, note that we perform the DDIM inversion on the SVD synthesized video with the T2V model LaVie, and you could change the beta scheduler of DDIM for accurate reconstruction.

Dumeowmeow commented 6 months ago

Hi @jufeif, note that we perform the DDIM inversion on the SVD synthesized video with the T2V model LaVie, and you could change the beta scheduler of DDIM for accurate reconstruction.

Thank you for your answer.I'm sorry I have another question. I did not find the explanation of these two grey arrows in the paper. What do they mean?

QQ图片20240311151823

JianhongBai commented 6 months ago

Hi @jufeif, they are used for spatial structure control. Please refer to the paragraph 'Spatial Structure Control on SA-S Modules' in Section 4.2 in the main text.