NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
https://nvlabs.github.io/Sana
417 stars 8 forks source link

Image style When you select "Anime" in , cold blue shadows are emphasized due to the influence of PAG Guidance Scale. #9

Open haruharu-1105 opened 3 days ago

haruharu-1105 commented 3 days ago

Your team is doing a great job.

If possible, it would be greatly appreciated if you could mention somewhere in your documentation that when the PAG (Perturbed-Attention Guidance) Guidance Scale value is high, the cold blue shadow color is emphasized when “Anime” is selected in Image Style.

Thank you for reading this far.

lawrence-cj commented 3 days ago

That’s a great observation. We set PAG to 1.5 or 2 to improve body structure and text rendering, while inadvertently overlooking its impact on other aspects. For users focused on face or other fidelity-oriented features, a PAG setting of 1 might be preferable for producing more natural-looking images. Based on your tests, do you have any suggestions for an optimal setting?

haruharu-1105 commented 3 days ago

First, I am a systems engineer, but not an expert in machine learning or image processing.

My personal suggestion is that “1” would be an appropriate value for the guidance scale for the animation style. However, regarding my first issue, I believe that this is a problem that can be resolved in the documentation.

The following is just a suggestion. We feel that the application developed by your teams has excellent features, such as very fast generation speed and high quality.

So, here is one idea that takes advantage of the “very fast generation speed” feature. We think it would be effective to create two copies of the generated image with different values of guidance scale and take a survey using Gradio's flag function. Reference link: Gradio flag function

This is because, while we currently have an excellent image generation function, we do not have a mechanism to collect feedback from users on the generated images, which we feel is a bit lacking as a demonstration function.

Of course, there are problems associated with the fact that respondents are anonymous, but I think it is possible to obtain statistically significant data.

lawrence-cj commented 3 days ago

Really great advice. What kind of Flag is enough for a demo? Any template website?

haruharu-1105 commented 3 days ago

What kind of Flag is enough for a demo?

We are considering two patterns. 1, Flags for A/B testing (we will start here first) Referring to https://imgsys.org/ flag, “left”, “tie”, and “right”.

2, Ideal flags (we will introduce these for product improvement purposes once the operational cycle is running smoothly)

Briefing materials for stakeholders. Performance indicators (e.g., impact of DC-AE improvements). Metrics will include speed, aesthetics, prompt accuracy, typography, etc.

The choice of these indicators will have a significant impact on the direction of the product. For example, YouTube and TikTok are both video posting sites, but the indicators set are different, creating differences in product content.

haruharu-1105 commented 3 days ago

I'll emphasize what I'm trying to say, just to be clear. The first issue is just that I would like to see it added to the documentation if possible.

At this time there is a lot of access to the demo, but access may be eased in the future with the release of code and models.

Development resources and time are limited, and the flag is only a nice-to-have level suggestion.

lawrence-cj commented 3 days ago

Appreciate that and good to know. Let me refine the demo website a bit first and add a document for guidance. Then try to make it improved once code released.