Open shauray8 opened 1 month ago
a series of text-conditioned Diffusion Transformers (DiT) capable of transforming textual descriptions into vivid images, dynamic videos, detailed multi-view 3D images, and synthesized speech.
Code - https://github.com/Alpha-VLLM/Lumina-T2X
there's a lot of residual noise reported with the model, needs to be tested before addition
a series of text-conditioned Diffusion Transformers (DiT) capable of transforming textual descriptions into vivid images, dynamic videos, detailed multi-view 3D images, and synthesized speech.
Code - https://github.com/Alpha-VLLM/Lumina-T2X