PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 68 forks source link

There is a gap with stable diffusion #53

Closed conansherry closed 2 months ago

conansherry commented 2 months ago

After fine-tuning it with my own data set, I tested it. In the realistic style, t2i gen images, although the clarity has been improved somewhat, the beauty of the opponents and faces feels far behind than sdxl, let alone sd3. Because the model 0.6B itself has limited capabilities?

lawrence-cj commented 2 months ago

The main reason maybe the training dataset. Our goal is not to generate real images, but refined images. If you want to generate more realistic ones, collect your dataset and finetune the model would be fine. I don't think the model size would be a problem to handle realistic style.

kelisiya commented 2 months ago

The main reason maybe the training dataset. Our goal is not to generate real images, but refined images. If you want to generate more realistic ones, collect your dataset and finetune the model would be fine. I don't think the model size would be a problem to handle realistic style.

Is there any relevant research that shows that larger Pixart models will improve the authenticity and logic of production? When I input some prompts with spatial logic relationships, such as: there is a dog on a box, a cat on the left, and a horse on the right, there will be problems, and it always feels like an oil painting style.

lawrence-cj commented 2 months ago

If you have seen the limitation stated in our papar, you will know the spatial, counting is not our best. We tried to enlarge the model and the counting ability will be improved. As for the spatial logic, you need to design specific prompt in the dataset to enhance the ability

image
kelisiya commented 2 months ago

If you have seen the limitation stated in our papar, you will know the spatial, counting is not our best. We tried to enlarge the model and the counting ability will be improved. As for the spatial logic, you need to design specific prompt in the dataset to enhance the ability image

Thank you for your efficient answer, I think in finetune my data set lost the relevant logic and was too concerned with the longer prompts and therefore lost the situation discussed in the article, I will reconstruct my data again to debug

toilaluan commented 1 month ago

@kelisiya @lawrence-cj hi, thanks for your nice work I just want to ask about your recent training progress @kelisiya