Proposal - implement MaskDiT technique for fast training

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

MIT License

11.31k stars 1.01k forks source link

Proposal - implement MaskDiT technique for fast training #135

Open RuslanPeresy opened 7 months ago

RuslanPeresy commented 7 months ago

This repo: Fast Training of Diffusion Models with Masked Transformers suggests using masked transformers architecture for faster DiT training. They claim that

Experiments on ImageNet-256x256 and ImageNet-512x512 show that our approach achieves competitive and even better generative performance than the state-of-the-art Diffusion Transformer (DiT) model, using only around 30% of its original training time.

Do you think it is worth considering adapting their code to the existing Latte model?

LinB203 commented 7 months ago

We add to todo list. but probably won't focus on that at the moment.

jialin-zhao commented 7 months ago

We attempted to incorporate long skip connection introduced by MDTv2 and mask strategy introduced by MaskDiT during training on webvid-10m. Although it didn't yield noticeable improvements, the most significant advantage was the acceleration of convergence.

SmileTAT commented 6 months ago

training with mae strategy, there will be one more decoder during reasoning. it is a serious shortcoming