URL

https://arxiv.org/abs/2310.20700
Affiliations
- Xinyuan Chen, N/A
- Yaohui Wang, N/A
- Lingjun Zhang, N/A
- Shaobin Zhuang, N/A
- Xin Ma, N/A
- Jiashuo Yu, N/A
- Yali Wang, N/A
- Dahua Lin, N/A
- Yu Qiao, N/A
- Ziwei Liu, N/A
  Abstract
- Recently video generation has achieved substantial progress with realisticresults. Nevertheless, existing AI-generated videos are usually very shortclips ("shot-level") depicting a single scene. To deliver a coherent long video("story-level"), it is desirable to have creative transition and predictioneffects across different clips. This paper presents a short-to-long videodiffusion model, SEINE, that focuses on generative transition and prediction.The goal is to generate high-quality long videos with smooth and creativetransitions between scenes and varying lengths of shot-level videos.Specifically, we propose a random-mask video diffusion model to automaticallygenerate transitions based on textual descriptions. By providing the images ofdifferent scenes as inputs, combined with text-based control, our modelgenerates transition videos that ensure coherence and visual quality.Furthermore, the model can be readily extended to various tasks such asimage-to-video animation and autoregressive video prediction. To conduct acomprehensive evaluation of this new generative task, we propose threeassessing criteria for smooth and creative transition: temporal consistency,semantic similarity, and video-text semantic alignment. Extensive experimentsvalidate the effectiveness of our approach over existing methods for generativetransition and prediction, enabling the creation of story-level long videos.Project page: https://vchitect.github.io/SEINE-project/ .
  Translation (by gpt-3.5-turbo)
最近、リアルな結果を出すことで、ビデオ生成は大きな進歩を遂げています。しかし、既存のAI生成のビデオは通常、単一のシーンを描いた非常に短いクリップ（"ショットレベル"）です。連続した長いビデオ（"ストーリーレベル"）を提供するためには、異なるクリップ間で創造的なトランジションと予測効果が望ましいです。本論文では、ジェネレーティブなトランジションと予測に焦点を当てた短いから長いビデオの拡散モデルであるSEINEを提案します。目標は、シーン間の滑らかで創造的なトランジションとショットレベルのビデオの長さの変化を持つ高品質な長いビデオを生成することです。具体的には、テキストの説明に基づいて自動的にトランジションを生成するためのランダムマスクビデオ拡散モデルを提案します。異なるシーンの画像を入力として提供し、テキストベースの制御と組み合わせることで、モデルは一貫性と視覚的品質を確保したトランジションビデオを生成します。さらに、このモデルは、画像からビデオへのアニメーションや自己回帰的なビデオ予測など、さまざまなタスクに容易に拡張できます。この新しい生成タスクの包括的な評価を行うために、滑らかで創造的なトランジションのための3つの評価基準を提案します：時間的一貫性、意味的類似性、ビデオテキストの意味的整合性。徹底的な実験により、ジェネレーティブなトランジションと予測の既存の手法に比べて、提案手法の有効性が検証され、ストーリーレベルの長いビデオの作成が可能になります。プロジェクトページ：https://vchitect.github.io/SEINE-project/。
Summary (by gpt-3.5-turbo)
本研究では、ビデオ生成において連続した長いビデオを生成するためのジェネレーティブなトランジションと予測に焦点を当てたモデルSEINEを提案する。SEINEはテキストの説明に基づいてトランジションを生成し、一貫性と視覚的品質を確保した長いビデオを生成する。さらに、提案手法は他のタスクにも拡張可能であり、徹底的な実験によりその有効性が検証されている。

AkihikoWatanabe / paper_notes

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction, Xinyuan Chen+, N/A, arXiv'23 #1169

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)