Temporal relationships?

ali-vilab / UniAnimate

Code for Paper "UniAnimate: Taming Unified Video Diﬀusion Models for Consistent Human Image Animation".

934 stars 50 forks source link

The released code uses the temporal Transformer, but temporal attention treats each frame equally. It seems that no use of tricks like TimeEmbedding in different frames. Does this mean that the network cannot distinguish the temporal relationships of different frames.

Hi, thank you for your suggestion. We did not include position coding in our experiment to train on 16 / 32 frames and can be easily extended to other temporal lengths. But adding temporal coding may be better for fixed temporal length.

ali-vilab / UniAnimate

Temporal relationships? #47