360CVGroup / FancyVideo

This is the official reproduction of FancyVideo.
https://360cvgroup.github.io/FancyVideo/
820 stars 71 forks source link

Question about inputing image + text -> video. #28

Open gray311 opened 2 weeks ago

gray311 commented 2 weeks ago

Hi, I am very interested in your work!

I'd like to know whether Weather FancyVideo received the initial frame + text prompt (at the same time) to generate the corresponding videos.

for example

video = pipe(
    prompt=prompt, # text prompt
    image=image, #start frame
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]
MaAo commented 1 week ago

Hi, I am very interested in your work!

I'd like to know whether Weather FancyVideo received the initial frame + text prompt (at the same time) to generate the corresponding videos.

for example

video = pipe(
    prompt=prompt, # text prompt
    image=image, #start frame
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

Yes, when using the i2v model, you are essentially generating a video based on the first frame and the accompanying text.

Vincento-Wang commented 1 week ago

Hi, I am very interested in your work! I'd like to know whether Weather FancyVideo received the initial frame + text prompt (at the same time) to generate the corresponding videos. for example

video = pipe(
    prompt=prompt, # text prompt
    image=image, #start frame
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

Yes, when using the i2v model, you are essentially generating a video based on the first frame and the accompanying text.

look forward to the i2v training code for research.