fltwr / generative-image-dynamics

143 stars 13 forks source link
computer-vision

An implementation of Generative Image Dynamics

An implementation of the diffusion model that generates oscillatory motion for an input image introduced in Generative Image Dynamics [1] and a model that animates the image with the generated motion based on softmax splatting [2].

Updates:

Dependencies:

Download trained models:

Notes:

Generating optical flow from image

Following the paper the motion synthesis model is implemented as a latent diffusion model which consists of a variational autoencoder (VAE) and a U-Net. It learns to synthesize the temporal FFT of optical flow conditioned on a still image. The U-Net was trained from scratch and the VAE was taken from CompVis/ldm-celebahq-256. The frequency attention layers in the paper were not implemented.

Example:

Training the U-Net:

Generating video from image and optical flow

A frame synthesis model takes an image and a forward flow field to predict a warped image. This model was not implemented following [1], but modified from the model in [2] which uses softmax splatting to warp image features at different resolutions and a GridNet to generate an image from warped features. The model in [1] also uses softmax splatting and a feature pyramid, but the output image is generated by the synthesis network from co-modulation GAN.

Example:

Evaluation:

Method PSNR ↑ SSIM ↑ LPIPS (AlexNet) ↓
Model 36.3127 0.9720 0.0096
Average splatting 34.8256 0.9657 0.0236
OpenCV remapping 34.7156 0.9654 0.0132

Model:

Training: