Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation
MIT License
2.09k stars 88 forks source link

still bad compare to hunyuan dit try this complex violinist prompt, one shot no cherry picking #53

Open s9anus98a opened 5 months ago

s9anus98a commented 5 months ago

kittychan is a beautiful female acoustic violin player with long brown hair which holds a violin, in the style of romantic riverscapes, yuumei, gil elvgren, hannah flowers, uhd image, album covers, synthetism-inspired ,ultra detailed realistic face

image

gaopengpjlab commented 5 months ago

Thanks for your timely feedback. We will continue to improve the foundational ability of our model. Currently, you can use extrapolation at 1536 to slightly improve the performance. image (2)

gaopengpjlab commented 5 months ago

In the technical report, hunyuan dit claim to use more than 1 billion image-text pairs for training while Lumina-Next utilize 20 million image-text pairs. We will continue to expand the versatility and domains of our training set.

s9anus98a commented 5 months ago

In the technical report, hunyuan dit claim to use more than 1 billion image-text pairs for training while Lumina-Next utilize 20 million image-text pairs. We will continue to expand the versatility and domains of our training set.

ooh now i see thanks for your reply. you & team doing great job. it just the parameter billion Vs million. i will help with testing since i don't know how to code lmao

gaopengpjlab commented 5 months ago

You are welcome to provide more feedback. We will carefully take your feedback into consideration and design efficient strategy to solve them.