3dvae的训练loss是什么？最基本的vaeloss？encode和decode一起训练？

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

MIT License

11.55k stars 1.03k forks source link

Open henbucuoshanghai opened 3 months ago

henbucuoshanghai commented 3 months ago

压缩维度是288？ 2是时间？88是hw？

qqingzheng commented 3 months ago

loss包括reconstruction loss（L1/L2），perceptual loss，gan loss，kl loss。

目前的vae是488倍压缩，时间4，空间8。

训练的时候encoder和decoder同时训练。

henbucuoshanghai commented 3 months ago

可以当做最基本的那种vae原理？但是输入是视频，thw可变化？fps同样是输入?