(Source: Make-A-Video, SimDA, PYoCo, SVD ,
Video LDM and Tune-A-Video)
- [News] The updated version is available on arXiv.
- [News] Our survey is accepted by ACM Computing Surveys (CSUR).
- [News] The Chinese translation is available on Zhihu. Special thanks to Dai-Wenxun for this.
Contact
If you have any suggestions or find our work helpful, feel free to contact us
Homepage: Zhen Xing
Email: zhenxingfd@gmail.com
If you find our survey is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@article{xing2023survey,
title={A survey on video diffusion models},
author={Xing, Zhen and Feng, Qijun and Chen, Haoran and Dai, Qi and Hu, Han and Xu, Hang and Wu, Zuxuan and Jiang, Yu-Gang},
journal={ACM Computing Surveys},
year={2023},
publisher={ACM New York, NY}
}
Open-source Toolboxes and Foundation Models
Table of Contents
Video Generation
Data
Caption-level
Category-level
Metric and BenchMark
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos |
|
|
- |
Jul., 2024 |
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation |
|
|
|
Jun., 2024 |
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models |
|
|
- |
ICLR, 2024 |
Subjective-Aligned Dateset and Metric for Text-to-Video Quality Assessment |
|
- |
- |
Mar, 2024 |
Towards A Better Metric for Text-to-Video Generation |
|
- |
|
Jan, 2024 |
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI |
|
- |
- |
Jan, 2024 |
VBench: Comprehensive Benchmark Suite for Video Generative Models |
|
|
|
Nov, 2023 |
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation |
|
- |
- |
NeurIPS, 2023 |
CVPR 2023 Text Guided Video Editing Competition |
|
- |
- |
Oct., 2023 |
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models |
|
|
|
Oct., 2023 |
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset |
|
- |
- |
Sep., 2023 |
Text-to-Video Generation
Training-based
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
Movie Gen |
|
- |
|
Oct, 2024 |
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer |
|
|
- |
Oct, 2024 |
Grid Diffusion Models for Text-to-Video Generation |
|
|
|
CVPR, 2024 |
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators |
|
|
|
Apr., 2024 |
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework |
|
- |
- |
Mar., 2024 |
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis |
|
- |
- |
Mar., 2024 |
Genie: Generative Interactive Environments |
|
- |
|
Feb., 2024 |
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis |
|
- |
|
Feb., 2024 |
Lumiere: A Space-Time Diffusion Model for Video Generation |
|
- |
|
Jan, 2024 |
UNIVG: TOWARDS UNIFIED-MODAL VIDEO GENERATION |
|
- |
|
Jan, 2024 |
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models |
|
|
|
Jan, 2024 |
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model |
|
- |
|
Jan, 2024 |
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation |
|
- |
|
Jan, 2024 |
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM |
|
- |
|
Jan, 2024 |
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos |
|
|
|
Dec, 2023 |
InstructVideo: Instructing Video Diffusion Models with Human Feedback |
|
|
|
Dec, 2023 |
VideoLCM: Video Latent Consistency Model |
|
- |
- |
Dec, 2023 |
Photorealistic Video Generation with Diffusion Models |
|
- |
|
Dec, 2023 |
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation |
|
|
|
Dec, 2023 |
Delving Deep into Diffusion Transformers for Image and Video Generation |
|
- |
|
Dec, 2023 |
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter |
|
|
|
Nov, 2023 |
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation |
|
- |
|
Nov, 2023 |
ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models |
|
|
|
Nov, 2023 |
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets |
|
|
|
Nov, 2023 |
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline |
|
|
|
Nov, 2023 |
MoVideo: Motion-Aware Video Generation with Diffusion Models |
|
- |
|
Nov, 2023 |
Make Pixels Dance: High-Dynamic Video Generation |
|
- |
|
Nov, 2023 |
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning |
|
- |
|
Nov, 2023 |
Optimal Noise pursuit for Augmenting Text-to-Video Generation |
|
- |
- |
Nov, 2023 |
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning |
|
- |
|
Nov, 2023 |
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation |
|
|
|
Oct, 2023 |
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction |
|
|
|
Oct, 2023 |
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors |
|
|
|
Oct., 2023 |
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation |
|
|
|
Oct., 2023 |
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model |
|
|
|
Oct, 2023 |
MotionDirector: Motion Customization of Text-to-Video Diffusion Models |
|
|
|
Oct, 2023 |
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning |
|
|
|
Sep., 2023 |
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation |
|
|
|
Sep., 2023 |
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models |
|
|
|
Sep., 2023 |
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation |
|
|
|
Sep., 2023 |
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation |
|
- |
|
Sep., 2023 |
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text |
|
- |
- |
Jul., 2023 |
Text2Performer: Text-Driven Human Video Generation |
|
|
|
Apr., 2023 |
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning |
|
|
|
Jul., 2023 |
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models |
|
- |
|
Aug., 2023 |
SimDA: Simple Diffusion Adapter for Efficient Video Generation |
|
|
|
CVPR, 2024 |
Dual-Stream Diffusion Net for Text-to-Video Generation |
|
- |
- |
Aug., 2023 |
ModelScope Text-to-Video Technical Report |
|
|
|
Aug., 2023 |
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation |
|
|
- |
Jul., 2023 |
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation |
|
- |
- |
May, 2023 |
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models |
|
- |
|
May, 2023 |
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models |
|
- |
|
- |
CVPR 2023 |
Latent-Shift: Latent Diffusion with Temporal Shift |
|
- |
|
- |
Apr., 2023 |
Probabilistic Adaptation of Text-to-Video Models |
|
- |
|
Jun., 2023 |
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation |
|
- |
|
Mar., 2023 |
ED-T2V: An Efficient Training Framework for Diffusion-based Text-to-Video Generation |
- |
- |
- |
IJCNN, 2023 |
MagicVideo: Efficient Video Generation With Latent Diffusion Models |
|
- |
|
- |
Nov., 2022 |
Phenaki: Variable Length Video Generation From Open Domain Textual Description |
|
- |
|
- |
Oct., 2022 |
Imagen Video: High Definition Video Generation With Diffusion Models |
|
- |
|
- |
Oct., 2022 |
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation |
|
|
|
- |
CVPR 2023 |
MAGVIT: Masked Generative Video Transformer |
|
- |
|
Dec., 2022 |
Make-A-Video: Text-to-Video Generation without Text-Video Data |
|
- |
|
- |
ICLR 2023 |
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths |
|
|
|
Nov., 2022 |
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers |
|
|
- |
May, 2022 |
Video Diffusion Models |
|
- |
|
- |
Apr., 2022 |
Training-free
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models |
|
|
|
Mar, 2024 |
TRAILBLAZER: TRAJECTORY CONTROL FOR DIFFUSION-BASED VIDEO GENERATION |
|
|
|
Jan, 2024 |
FreeInit: Bridging Initialization Gap in Video Diffusion Models |
|
|
|
Dec, 2023 |
MTVG : Multi-text Video Generation with Text-to-Video Models |
|
- |
|
Dec, 2023 |
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis |
|
- |
- |
Nov, 2023 |
AdaDiff: Adaptive Step Selection for Fast Diffusion |
|
- |
- |
Nov, 2023 |
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax |
|
|
|
Nov, 2023 |
🏀GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning |
|
|
|
Nov, 2023 |
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling |
|
|
|
Oct, 2023 |
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation |
|
|
|
Oct, 2023 |
LLM-grounded Video Diffusion Models |
|
|
|
Oct, 2023 |
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator |
|
|
- |
NeurIPS, 2023 |
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis |
|
|
|
Aug, 2023 |
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation |
|
|
- |
May, 2023 |
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators |
|
|
|
Mar., 2023 |
PEEKABOO: Interactive Video Generation via Masked-Diffusion 🫣 |
|
|
|
CVPR, 2024 |
Video Generation with other conditions
Pose-guided Video Generation
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model |
|
|
|
ECCV 2024 |
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance |
|
|
|
Jul., 2024 |
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance |
|
|
|
Mar., 2024 |
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions |
|
- |
- |
Mar., 2024 |
Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons |
|
- |
- |
Jan., 2024 |
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models |
|
- |
|
Dec., 2023 |
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model |
|
|
|
Nov., 2023 |
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation |
|
|
|
Nov., 2023 |
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer |
|
|
|
Nov., 2023 |
DisCo: Disentangled Control for Referring Human Dance Generation in Real World |
|
|
|
Jul., 2023 |
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model |
|
- |
- |
Aug., 2023 |
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion |
|
|
|
Apr., 2023 |
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos |
|
|
|
Apr., 2023 |
Motion-guided Video Generation
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
MOTIONCLONE: TRAINING-FREE MOTION CLONING FOR CONTROLLABLE VIDEO GENERATION |
|
|
|
Jun., 2024 |
Tora: Trajectory-oriented Diffusion Transformer for Video Generation |
|
|
|
Jul., 2024 |
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model |
|
|
|
ECCV 2024 |
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance |
|
|
|
Mar., 2024 |
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling |
|
- |
- |
Jan., 2024 |
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation |
|
- |
- |
Jan., 2024 |
Customizing Motion in Text-to-Video Diffusion Models |
|
- |
|
Dec., 2023 |
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models |
|
|
|
CVPR 2024 |
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance |
|
|
|
Nov., 2023 |
Motion-Conditioned Diffusion Model for Controllable Video Synthesis |
|
- |
|
Apr., 2023 |
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory |
|
- |
- |
Aug., 2023 |
Sound-guided Video Generation
Image-guided Video Generation
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation |
|
|
|
ECCV 2024 |
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models |
|
|
|
CVPR 2024 |
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model |
|
|
|
Jun., 2024 |
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation |
|
- |
|
Mar., 2024 |
AtomoVideo: High Fidelity Image-to-Video Generation |
|
- |
|
Mar., 2024 |
Animated Stickers: Bringing Stickers to Life with Video Diffusion |
|
- |
- |
Feb., 2024 |
CONSISTI2V: Enhancing Visual Consistency for Image-to-Video Generation |
|
- |
|
Feb., 2024 |
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models |
|
- |
- |
Dec., 2023 |
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models |
|
- |
|
Dec., 2023 |
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance |
|
- |
|
Nov., 2023 |
LivePhoto: Real Image Animation with Text-guided Motion Control |
|
|
|
Nov., 2023 |
VideoBooth: Diffusion-based Video Generation with Image Prompts |
|
|
|
Nov., 2023 |
Decouple Content and Motion for Conditional Image-to-Video Generation |
|
- |
- |
Nov, 2023 |
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models |
|
- |
- |
Nov, 2023 |
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image |
|
- |
- |
MM, 2023 |
Generative Image Dynamics |
|
- |
|
Sep., 2023 |
LaMD: Latent Motion Diffusion for Video Generation |
|
- |
- |
Apr., 2023 |
Conditional Image-to-Video Generation with Latent Flow Diffusion Models |
|
|
- |
CVPR 2023 |
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis |
|
|
|
CVPR 2022 |
Brain-guided Video Generation
Depth-guided Video Generation
Multi-modal guided Video Generation
Title |
arXiv |
Github |
WebSite |
Pub. & Date |
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control |
|
- |
- |
Mar., 2024 |
Magic-Me: Identity-Specific Video Customized Diffusion |
|
- |
|
Feb., 2024 |
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions |
|
- |
|
Feb., 2024 |
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion |
|
- |
|
Feb., 2024 |
Boximator: Generating Rich and Controllable Motions for Video Synthesis |
|
- |
|
Feb., 2024 |
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning |
|
- |
- |
Jan., 2024 |
ActAnywhere: Subject-Aware Video Background Generation |
|
- |
|
Jan., 2024 |
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects |
|
- |
- |
Jan., 2024 |
MoonShot: Towards Controllable Video Generation and Editing with Multimodal Conditions |
|
|
|
Jan., 2024 |
PEEKABOO: Interactive Video Generation via Masked-Diffusion |
|
- |
|
Dec., 2023 |
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling |
|
- |
- |
Dec., 2023 |
Fine-grained Controllable Video Generation via Object Appearance and Context |
|
- |
|
Nov., 2023 |
GPT4Video: A Unified Multimodal Large Language Model for Instruction-Followed Understanding and Safety-Aware Generation |
|
- |
|
Nov., 2023 |
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving |
|
- |
|
Nov., 2023 |
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models |
|
- |
|
Nov., 2023 |
VideoComposer: Compositional Video Synthesis with Motion Controllability |
|
|
|
Jun., 2023 |
NExT-GPT: Any-to-Any Multimodal LLM |
|
- |
- |
Sep, 2023 |
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images |
|
- |
|
Jun, 2023 |
Any-to-Any Generation via Composable Diffusion |
|
|
|
May, 2023 |
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation |
|
|
- |
CVPR 2023 |
Unconditional Video Generation
U-Net based
Transformer based
Video Completion
Video Enhancement and Restoration
Video Prediction
Title |
arXiv |
Github |
Website |
Pub. & Date |
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction |
|
|
|
Jun, 2024 |
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction |
|
|
- |
Dec, 2023 |
Video Diffusion Models with Local-Global Context Guidance |
|
|
- |
IJCAI, 2023 |
Seer: Language Instructed Video Prediction with Latent Diffusion Models |
|
- |
|
Mar., 2023 |
MaskViT: Masked Visual Pre-Training for Video Prediction |
|
|
|
Jun, 2022 |
Diffusion Models for Video Prediction and Infilling |
|
|
|
TMLR 2022 |
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation |
|
|
|
NeurIPS 2022 |
Diffusion Probabilistic Modeling for Video Generation |
|
|
- |
Mar., 2022 |
Flexible Diffusion Modeling of Long Videos |
|
|
|
May, 2022 |
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models |
|
|
|
May, 2023 |
Video Editing
General Editing Model
Title |
arXiv |
Github |
Website |
Pub. Date |
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing |
|
|
|
Jun, 2024 |
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation |
|
- |
- |
Mar., 2024 |
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing |
|
- |
- |
Mar., 2024 |
DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing |
|
- |
|
Mar, 2024 |
Video Editing via Factorized Diffusion Distillation |
|
- |
- |
Mar, 2024 |
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis |
|
|
|
Dec, 2023 |
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers |
|
- |
|
Dec, 2023 |
Neutral Editing Framework for Diffusion-based Video Editing |
|
- |
|
Dec, 2023 |
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence |
|
- |
|
Nov, 2023 |
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models |
|
|
|
Nov, 2023 |
Motion-Conditioned Image Animation for Video Editing |
|
- |
|
Nov, 2023 |
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation |
|
- |
- |
Sep, 2023 |
MagicEdit: High-Fidelity and Temporally Coherent Video Editing |
|
- |
- |
Aug, 2023 |
Edit Temporal-Consistent Videos with Image Diffusion Model |
|
- |
- |
Aug, 2023 |
Structure and Content-Guided Video Synthesis With Diffusion Models |
|
- |
|
ICCV, 2023 |
Dreamix: Video Diffusion Models Are General Video Editors |
|
- |
|
Feb, 2023 |
Training-free Editing Model
Title |
arXiv |
Github |
Website |
Pub. Date |
MVOC: a training-free multiple video object composition method with diffusion models |
|
|
|
Jun, 2024 |
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing |
|
|
|
Jun, 2024 |
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing |
|
|
|
March, 2024 |
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing |
|
- |
|
Feb, 2024 |
Object-Centric Diffusion for Efficient Video Editing |
|
- |
- |
Jan, 2024 |
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing |
|
- |
- |
Dec, 2023 |
VidToMe: Video Token Merging for Zero-Shot Video Editing |
|
|
|
Dec, 2023 |
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing |
|
|
|
Dec, 2023 |
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators |
|
|
- |
Dec, 2023 |
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models |
|
|
|
Dec, 2023 |
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models |
|
- |
|
Nov., 2023 |
Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion |
|
- |
- |
Nov., 2023 |
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier |
|
|
- |
Oct., 2023 |
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation |
|
- |
- |
Nov., 2023 |
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models |
|
- |
- |
Oct., 2023 |
LOVECon: Text-driven Training-Free Long Video Editing with ControlNet |
|
|
- |
Oct., 2023 |
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing |
|
- |
|
Oct., 2023 |
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models |
|
|
|
ICLR, 2024 |
MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance |
|
- |
- |
Aug., 2023 |
EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints |
|
- |
- |
Aug., 2023 |
ControlVideo: Training-free Controllable Text-to-Video Generation |
|
|
- |
May, 2023 |
TokenFlow: Consistent Diffusion Features for Consistent Video Editing |
|
|
|
Jul., 2023 |
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing |
|
- |
|
Jun., 2023 |
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation |
|
- |
|
Jun., 2023 |
Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models |
|
|
|
Mar., 2023 |
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing |
|
|
|
Mar., 2023 |
Pix2video: Video Editing Using Image Diffusion |
|
- |
|
Mar., 2023 |
InFusion: Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing |
|
- |
|
Aug., 2023 |
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising |
|
|
|
May, 2023 |
One-shot Editing Model
Title |
arXiv |
Github |
Website |
Pub. & Date |
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models |
|
- |
|
Feb., 2024 |
MotionCrafter: One-Shot Motion Customization of Diffusion Models |
|
|
- |
Dec., 2023 |
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing |
|
- |
|
Dec., 2023 |
MotionEditor: Editing Video Motion via Content-Aware Diffusion |
|
|
|
CVPR, 2024 |
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning |
|
- |
|
Nov., 2023 |
Cut-and-Paste: Subject-Driven Video Editing with Attention Control |
|
- |
- |
Nov, 2023 |
StableVideo: Text-driven Consistency-aware Diffusion Video Editing |
|
|
|
ICCV, 2023 |
Shape-aware Text-driven Layered Video Editing |
|
- |
- |
CVPR, 2023 |
SAVE: Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-guided Video Editing |
|
|
- |
May, 2023 |
Towards Consistent Video Editing with Text-to-Image Diffusion Models |
|
- |
- |
Mar., 2023 |
Edit-A-Video: Single Video Editing with Object-Aware Consistency |
|
- |
|
Mar., 2023 |
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation |
|
|
|
ICCV, 2023 |
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing |
|
|
|
May, 2023 |
Video-P2P: Video Editing with Cross-attention Control |
|
|
|
Mar., 2023 |
SinFusion: Training Diffusion Models on a Single Image or Video |
|
|
|
Nov., 2022 |
Instruct-guided Video Editing
Motion-guided Video Editing
Sound-guided Video Editing
Multi-modal Control Editing Model
Domain-specific Editing Model
Non-diffusion Editing model
Video Understanding
Title |
arXiv |
Github |
Website |
Pub. Date |
EchoReel: Enhancing Action Generation of Existing Video Diffusion Modelsl |
|
- |
- |
Mar., 2024 |
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model |
|
- |
- |
Mar., 2024 |
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion |
|
- |
- |
Mar., 2024 |
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models |
|
- |
- |
Mar., 2024 |
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation |
|
- |
- |
Mar., 2024 |
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction |
|
- |
- |
Mar., 2024 |
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval |
|
- |
- |
Jan., 2024 |
Diffusion Reward: Learning Rewards via Conditional Video Diffusion |
|
|
|
Dec., 2023 |
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models |
|
- |
|
Nov., 2023 |
Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models |
|
|
- |
Nov., 2023 |
Flow-Guided Diffusion for Video Inpainting |
|
|
- |
Nov., 2023 |
Breathing Life Into Sketches Using Text-to-Video Priors |
|
- |
- |
Nov., 2023 |
Infusion: Internal Diffusion for Video Inpainting |
|
- |
- |
Nov., 2023 |
DiffusionVMR: Diffusion Model for Video Moment Retrieval |
|
- |
- |
Aug., 2023 |
DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation |
|
- |
- |
Aug., 2023 |
CoTracker: It is Better to Track Together |
|
|
|
Aug., 2023 |
Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations |
|
- |
- |
ICIAP, 2023 |
Exploring Diffusion Models for Unsupervised Video Anomaly Detection |
|
- |
- |
Apr., 2023 |
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection |
|
- |
- |
ICCV, 2023 |
Diffusion Action Segmentation |
|
- |
- |
Mar., 2023 |
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion |
|
|
|
Mar., 2023 |
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model |
|
|
- |
ICCV, 2023 |
MomentDiff: Generative Video Moment Retrieval from Random to Real |
|
|
|
Jul., 2023 |
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition |
|
|
|
Feb., 2023 |
Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning |
|
- |
- |
Nov., 2022 |
A Generalist Framework for Panoptic Segmentation of Images and Videos |
|
|
|
Oct., 2022 |
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models |
|
- |
- |
Jul., 2023 |
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming |
|
- |
- |
Mar., 2023 |
Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition |
|
- |
- |
Jul., 2023 |
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos |
|
|
- |
CVPR 2023 |