ali-vilab / UniAnimate

Code for Paper "UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation".
https://unianimate.github.io/
934 stars 50 forks source link

Temporal relationships? #47

Open blackight opened 2 months ago

blackight commented 2 months ago

The released code uses the temporal Transformer, but temporal attention treats each frame equally. It seems that no use of tricks like TimeEmbedding in different frames. Does this mean that the network cannot distinguish the temporal relationships of different frames.

wangxiang1230 commented 2 months ago

The released code uses the temporal Transformer, but temporal attention treats each frame equally. It seems that no use of tricks like TimeEmbedding in different frames. Does this mean that the network cannot distinguish the temporal relationships of different frames.

Hi, thank you for your suggestion. We did not include position coding in our experiment to train on 16 / 32 frames and can be easily extended to other temporal lengths. But adding temporal coding may be better for fixed temporal length.