Open okotaku opened 3 weeks ago
Two academic papers are cited to support the request:
Scalable Diffusion Models with Transformers: https://arxiv.org/abs/2212.09748 PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis: https://arxiv.org/abs/2310.00426
TransformerEngine does not yet support DiT.
Specific differences such as LN elementwise_affine=False and Transformer layer with Time step aware scale / shift are highlighted.
LN elementwise_affine=False
Transformer layer with Time step aware scale / shift
References
Two academic papers are cited to support the request:
Scalable Diffusion Models with Transformers: https://arxiv.org/abs/2212.09748 PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis: https://arxiv.org/abs/2310.00426
Current Support
TransformerEngine does not yet support DiT.
Differences Noted
Specific differences such as
LN elementwise_affine=False
andTransformer layer with Time step aware scale / shift
are highlighted.