Editability Metrics - Githubissues

FusionBrainLab / StyleFeatureEditor

Official Implementation for "The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing"

MIT License

98 stars 11 forks source link

Editability Metrics #9

Closed JhEdLp closed 1 week ago

JhEdLp commented 3 weeks ago

I have a question about the editability metrics presented in Table 1. Specifically, I am curious about the fitting power used in these metrics. Was the same power applied to all methods in the comparisons?

I ask because the choice of edit power can significantly influence the output image. For example, when adding glasses, a higher power could result in darker glasses, which could significantly affect the final FID estimate. Could you clarify what value was used for the editability power when calculating FID as an editability metric?

Thanks in advance.

retir commented 3 weeks ago

Hi, thanks for your question!

The choice of editing power indeed has a lot of influence on metrics, and moreover, different methods often have different sensitivities to this parameter. For example, using the "Smile" edit at power 6 will be fine for e4e, but HyperInverter will produce artefacts -- too wide, unnatural smile. In this case it is necessary to reduce the power for the HyperInverter to make the edit work well.

To solve this problem and to compare fairly, we fix the editing direction and infer the method with a grid search of the editing power. For each power, we then calculate FID (according to the editing metric from paper) and choose the lowest one. If the difference of FID between the chosen power and its neighbourhood is large, we repeat this procedure within a smaller search range and use a smaller step in the greedy search. Thus, for each method and for each editing direction, we found its editing power independently.

retir commented 1 week ago

Have we answered all your questions regarding the issue? Can we close it?

JhEdLp commented 1 week ago

Hi,

Thank you for your first answer; however, I have the following question

Do you visually confirm the results of the edit, or do you have a minimum and maximum power threshold within the search grid? I ask because some methods shows that when the power is set to 1, the edit is applied to only a few images or is very minimal. This may result in a low FID score that does not accurately reflect the level of editing.

retir commented 1 week ago

For this FID metric it is important to check that the edits applied in the method have the same editing effect as assumed in the chosen CelebaHQ markup.

We set a boundary and search for the minimum FID. If the found minimum is close to the boundary, we abroad the boundary and continue our repetitive procedure. In most cases, the found minimum correlates with the visual results: the editing was correctly applied to most of the images we saw. In our study, when we see that the editing power for the minimum FID is very low, we found that in most cases the reason for this is that the chosen editing direction and the CelebaHQ markup assume different edits. In this case, indeed, the optimum of the editing power will be close to inversion (i.e. editing power = 0), because increasing the editing power will increase the differences between the distributions of the image sets.

Does this answer your question?

JhEdLp commented 1 week ago

Yes, thank you very much.