Closed conansherry closed 2 months ago
The main reason maybe the training dataset. Our goal is not to generate real images, but refined images. If you want to generate more realistic ones, collect your dataset and finetune the model would be fine. I don't think the model size would be a problem to handle realistic style.
The main reason maybe the training dataset. Our goal is not to generate real images, but refined images. If you want to generate more realistic ones, collect your dataset and finetune the model would be fine. I don't think the model size would be a problem to handle realistic style.
Is there any relevant research that shows that larger Pixart models will improve the authenticity and logic of production? When I input some prompts with spatial logic relationships, such as: there is a dog on a box, a cat on the left, and a horse on the right, there will be problems, and it always feels like an oil painting style.
If you have seen the limitation stated in our papar, you will know the spatial, counting is not our best. We tried to enlarge the model and the counting ability will be improved. As for the spatial logic, you need to design specific prompt in the dataset to enhance the ability
If you have seen the limitation stated in our papar, you will know the spatial, counting is not our best. We tried to enlarge the model and the counting ability will be improved. As for the spatial logic, you need to design specific prompt in the dataset to enhance the ability
Thank you for your efficient answer, I think in finetune my data set lost the relevant logic and was too concerned with the longer prompts and therefore lost the situation discussed in the article, I will reconstruct my data again to debug
@kelisiya @lawrence-cj hi, thanks for your nice work I just want to ask about your recent training progress @kelisiya
After fine-tuning it with my own data set, I tested it. In the realistic style, t2i gen images, although the clarity has been improved somewhat, the beauty of the opponents and faces feels far behind than sdxl, let alone sd3. Because the model 0.6B itself has limited capabilities?