Add Replicate demo and API

chenxwh commented 7 months ago

Hi @vinesmsuic @lim142857 @wren93 ,

Very cool project on AnyV2V!

This pull request makes it possible to run AnyV2V on Replicate (https://replicate.com/cjwbw/AnyV2V) and via API (https://replicate.com/cjwbw/AnyV2V/api). Currently, the demo includes prompt-based video editing. Also, we'd like to transfer the demo page/redirect to TIGER-AI-Lab so you can make modifications easily, and happy to help maintain/integrate the upcoming changes :)

vinesmsuic commented 7 months ago

Thanks @chenxwh , will take a look into it

Max

lim142857 commented 7 months ago

@chenxwh Thank you for your hard work! I was wondering if we could add "ddim_init_latents_t_idx": 0 (default), "pnp_f_t": 1.0 (default), "pnp_spatial_attn_t": 1.0 (default), and "pnp_temp_attn_t": 1.0 (default) to the tweakable configs in the Replicate page.

chenxwh commented 7 months ago

@chenxwh Thank you for your hard work! I was wondering if we could add "ddim_init_latents_t_idx": 0 (default), "pnp_f_t": 1.0 (default), "pnp_spatial_attn_t": 1.0 (default), and "pnp_temp_attn_t": 1.0 (default) to the tweakable configs in the Replicate page.

Sure, happy to! Could you maybe provide some short description to those variables so I can add them to the demo too? I think it'll help people understand better how to set them. Thank you!

vinesmsuic commented 7 months ago

@chenxwh Thank you for your hard work! I was wondering if we could add "ddim_init_latents_t_idx": 0 (default), "pnp_f_t": 1.0 (default), "pnp_spatial_attn_t": 1.0 (default), and "pnp_temp_attn_t": 1.0 (default) to the tweakable configs in the Replicate page.

Sure, happy to! Could you maybe provide some short description to those variables so I can add them to the demo too? I think it'll help people understand better how to set them. Thank you!

ddim_init_latents_t_idx: This parameter determines the time step index at which to begin sampling from the initial DDIM inversed latents, with a range of [0, num_sampling_steps-1] and a default value of 0. In the context of a DDIM sampling process where the sampling step is 50, the scheduler progresses through the time steps in the sequence [981, 961, 941, ..., 1]. Therefore, setting ddim_init_latents_t_idx to 0 initiates the sampling from t=981, whereas setting it to 1 starts the process at t=961. A higher index enhances motion consistency with the source video but may lead to flickering and cause the edited video to diverge from the edited first frame.
pnp_f_inject_t: Specifies the proportion of time steps in the DDIM sampling process where the convolutional injection is applied. The value ranges from [0.0, 1.0], with the default set to 1.0, indicating injection at every time step.
pnp_spatial_attn_t: Specifies the proportion of time steps in the DDIM sampling process where the spatial attention injection is applied. The value ranges from [0.0, 1.0], with the default set to 1.0, indicating injection at every time step.
pnp_temp_attn_t: Specifies the proportion of time steps in the DDIM sampling process where the temporal attention injection is applied. The value ranges from [0.0, 1.0], with the default set to 1.0, indicating injection at every time step.
For pnp_f_inject_t, pnp_spatial_attn_t, and pnp_temp_attn_t, a higher value improves motion consistency with the source video. However, if the edited first frame differs too much from the original first frame, a higher value may cause flickering.

chenxwh commented 7 months ago

thanks @vinesmsuic, I have added those to the demo now!

lim142857 commented 7 months ago

@chenxwh Thanks! Please checkout this updated config descriptions:

ddim_init_latents_t_idx: This parameter determines the time step index at which to begin sampling from the initial DDIM inversed latents, with a range of [0, num_sampling_steps-1] and a default value of 0. In the context of a DDIM sampling process where the sampling step is 50, the scheduler progresses through the time steps in the sequence [981, 961, 941, ..., 1]. Therefore, setting ddim_init_latents_t_idx to 0 initiates the sampling from t=981, whereas setting it to 1 starts the process at t=961. A higher index enhances motion consistency with the source video but may lead to flickering and cause the edited video to diverge from the edited first frame.
pnp_f_inject_t: Specifies the proportion of time steps in the DDIM sampling process where the convolutional injection is applied. The value ranges from [0.0, 1.0], with the default set to 1.0, indicating injection at every time step.
pnp_spatial_attn_t: Specifies the proportion of time steps in the DDIM sampling process where the spatial attention injection is applied. The value ranges from [0.0, 1.0], with the default set to 1.0, indicating injection at every time step.
pnp_temp_attn_t: Specifies the proportion of time steps in the DDIM sampling process where the temporal attention injection is applied. The value ranges from [0.0, 1.0], with the default set to 1.0, indicating injection at every time step.
For pnp_f_inject_t, pnp_spatial_attn_t, and pnp_temp_attn_t, a higher value improves motion consistency with the source video. However, if the edited first frame differs too much from the original first frame, a higher value may cause flickering.

vinesmsuic commented 7 months ago

@chenxwh right, We found that 1.0 on the 3 pnp injection value works best for prompt-based editing on I2VGen-XL, so maybe its better to use 1.0 for the demo. Sorry for the confusion. Can you create another commit change?

chenxwh commented 7 months ago

Sure! The latest changes reflect the updated default value with detailed descriptions for those values. An updated example is also added to the demo.

chenxwh commented 7 months ago

Thanks for the merge! I have redirected the page to https://replicate.com/tiger-ai-lab/anyv2v and added you to the tiger-ai-lab org (https://replicate.com/tiger-ai-lab) so you have the authority to make any changes to the page! And always happy to help push updates :)

lim142857 commented 7 months ago

@chenxwh Thanks a lot for the contribution! Could you also add me to the tiger-ai-lab org(https://replicate.com/tiger-ai-lab) :)

chenxwh commented 7 months ago

Sure thing @lim142857! Just added you as well :D

vinesmsuic commented 7 months ago

Hi @chenxwh, I wonder if we can modify the demo to allow users to input their own edited_1st_frame to override instruction prompt if provided? It seems a lot of users want to try out with their own edited first frame instead of the instructpix2pix output.

chenxwh commented 7 months ago

sure I will make the changes later today :)

wenhuchen commented 7 months ago

I think letting people upload image probably causes too much overhead. People might need to visit other website to do it. It's a bit complex.

@chenxwh I'm wondering whether it's possible to breakdown the demo to two steps because the first-step result from instructpix2pix is not very stable. We can sweep several hparams (random seed, cfg params) to let instructpix2pix generate a few different images (They can even re-run this until they are happy with it). This should be quite cheap. Then a user can click on the most satisfied image to continue to do video generation. This will dramatically increase the success rate.

chenxwh commented 7 months ago

I think letting people upload image probably causes too much overhead. People might need to visit other website to do it. It's a bit complex.

@chenxwh I'm wondering whether it's possible to breakdown the demo to two steps because the first-step result from instructpix2pix is not very stable. We can sweep several hparams (random seed, cfg params) to let instructpix2pix generate a few different images (They can even re-run this until they are happy with it). This should be quite cheap. Then a user can click on the most satisfied image to continue to do video generation. This will dramatically increase the success rate.

The demo on the website only supports end of end inference. So I think the best way is to give option to use the default full pipeline or accept provided first frame that is obtained from the existing instructpix2pix model.

vinesmsuic commented 7 months ago

Hi @chenxwh, just discussed with @wenhuchen and we would love to stick to the original plan (modify the demo to allow users to input their own edited_1st_frame to override instruction prompt if provided). Really appreciate your help :)

chenxwh commented 7 months ago

A new version is pushed to Replicate now :) and opened another PR for the change

TIGER-AI-Lab / AnyV2V

Add Replicate demo and API #1