TIGER-AI-Lab / AnyV2V

Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" (TMLR 2024)
https://tiger-ai-lab.github.io/AnyV2V/
MIT License
508 stars 38 forks source link

Can a 25 frame/s video with a duration greater than 2 seconds be reconstructed? #3

Closed zerzhier closed 7 months ago

zerzhier commented 7 months ago

Can a 25 frame/s video with a duration greater than 2 seconds be reconstructed by this framework?

matthewlesiuk commented 7 months ago

following

vinesmsuic commented 7 months ago

Hi,

We haven't actually tried it but here are some thoughts that we would leave it to the community's exploration:

To achieve high fps video editing, one possible direction is to first apply AnyV2V and then apply video interpolation model (e.g. the interpolation model of LaVie turns 16frames->61frames(30fps)).

To generate longer duration of video, we suggest the exploration of Autoregressive Long Video Generation with our AnyV2V method.

Max

zerzhier commented 7 months ago

我把30帧/s总时长4秒的视频转成8帧/秒,总时长变为16秒,然后按2秒进行切分得到8个视频片段,每个片段做一次ddim_inversion+pnp_edit处理生成8个新的视频片段,然后将视频片段拼接成一个视频,再将码率转为25帧/s,可以解决长视频的生成问题,但是这个时间太长,RTX3090环境下,30帧/s总时长4秒的视频生成耗时差不多50分钟,一个视频片段处理耗时6分钟左右。正常一个60s的视频处理时间不应该超过1小时吧。按目前的速度,60秒的视频处理时间得12个小时以上了。如何让这个时间变短?

I converted a video with a total duration of 4 seconds at 30 frames per second into 8 frames per second, with a total duration of 16 seconds. Then, I divided it into 8 video segments by 2 seconds, and each segment was processed with ddim conversion+pnp edit to generate 8 new video segments. Then, the video segments were concatenated into one video, and the bit rate was reduced to 25 frames per second to solve the problem of generating long videos.

However, this time is too long. In the RTX3090 environment, a video with a total duration of 4 seconds at 30 frames per second takes about 50 minutes to generate, while a video segment takes about 6 minutes to process.

A normal 60 second video processing time should not exceed 1 hour. At the current speed, a 60 second video processing time would take more than 12 hours. How to shorten this time?

vinesmsuic commented 7 months ago

Let me do some experiments on it.

vinesmsuic commented 7 months ago

We figured out a way to edit longer video (8~16 seconds) in a reasonable inference time. Stay tuned for the updates.

BigcowPeking commented 5 months ago

Have you been updated? @vinesmsuic

vinesmsuic commented 5 months ago

Have you been updated? @vinesmsuic

Yes. Currently only supported in Local gradio demo. You can try videos up to 128 frames.

vinesmsuic commented 5 months ago

https://tiger-ai-lab.github.io/AnyV2V/static/videos/long_video_results/tokyo-walk_robot.mp4

https://tiger-ai-lab.github.io/AnyV2V/static/videos/long_video_results/woman-running_hoodie.mp4

BigcowPeking commented 5 months ago

@vinesmsuic nice work! Could you please elaborate on the method you've implemented? What is the inference time like? And how is the apparent consistency maintained, for instance, in a cartooning task? Thank you very much.

vinesmsuic commented 4 months ago

You can check the latest version of our Arxiv paper :)