PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Apache License 2.0
11.61k stars 1.03k forks source link

Taking inspiration from Stable Diffusion 3 #43

Open kabachuha opened 9 months ago

kabachuha commented 9 months ago

As you probably know, StabilityAI today published their architecture details of SD3.

https://stability.ai/news/stable-diffusion-3-research-paper / https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf

The key takeaways are:

  1. Rectified Flow (much faster than diffusion)
  2. Joint Transformer for both Text and Image embedding processing
  3. Improved text encoding/prompt-alignment by using mixture of CLIPs and T5
  4. Deduplication efforts
  5. Outperforms SOTA
  6. Scales to Text2Video too

image

I think these ideas can be of much help to OpenSora project

cxh0519 commented 9 months ago

Thanks for your summarization! We are working on it.