为什么是videogpt，不应该是dit这种扩散模型吗，videogpt这种是自回归模型吧。

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

MIT License

11.61k stars 1.03k forks source link

为什么是videogpt，不应该是dit这种扩散模型吗，videogpt这种是自回归模型吧。 #17

Open af-74413592 opened 9 months ago

LinB203 commented 9 months ago

We use Video-VQVAE (although it comes from VideoGPT), which is a auto-encoder model that encodes video from the pixel domain to the latent, and then decodes the latent to the pixel domain.

wing158 commented 9 months ago

数据模型有考虑3D Gaussian Splatting方面的应用吗？我理解3d在物理规律，视频一致性方面表现良好。如https://github.com/GaussianObject/GaussianObject 4张不同角度的图通过粗糙模型+再加扩散模型修补数据后可成细化的3dgs.

cxh0519 commented 9 months ago

We agree that video created by a powerful generator will be useful for NeRF/3DGS-based 3D content generation. However, we mainly adopt realistic videos for training in the preliminary stage. Synthetic video created by game engines or 3D representations might be concerned in the future.

chg0901 commented 9 months ago

Is there any public open-source “Synthetic video created by game engines or 3D representations” data or related research works?

LinB203 commented 8 months ago

Is there any public open-source “Synthetic video created by game engines or 3D representations” data or related research works?

Take it into todo list, we are working for this type dataset.