LonglongaaaGo / EXE-GAN

Facial image inpainting is a task of filling visually realistic and semantically meaningful contents for missing or masked pixels in a face image. This paper presents EXE-GAN, a novel diverse and interactive facial inpainting framework, which can not only preserve the high-quality visual effect of the whole image but also complete the face image with exemplar-like facial attributes.
MIT License
55 stars 3 forks source link

Question regarding algorithm 1 #4

Closed JunseongAHN closed 6 months ago

JunseongAHN commented 6 months ago

Hello! thank you for your great work!

I checked train.py and got a question.

        rand_num = random.randint(1,rand_end)
        # for inference
        if rand_num == rand_end:
            infer_img = real_img
        else:
            infer_img = torch.flip(real_img, dims=[0])

It looks like given integer (rand_end), pick a integer N ~ Uniform({1,...,rand_end}), if N==rand_end, flip the real_img.

I believe it is different from the description in the algorithm 1 of your paper.

image

LonglongaaaGo commented 6 months ago

Hi @JunseongAHN Thank you so much for your attention.

You are correct in noting that the implementation method in the code appears different from what we described in Algorithm 1 of our paper. Despite this variance in approach, the underlying objective remains consistent. The core idea in both the implementation and the paper is to control the utilization of the exemplar with a similar probability mechanism.
However, as you suggested, implementing it as described in the paper is equally valid, and we encourage such adaptations as long as they align with the fundamental goal of controlling the use of the exemplar based on a set probability.

We appreciate your engagement with our work and hope this clarifies your query. If you have any further questions or need more detailed explanations, please feel free to reach out.

JunseongAHN commented 6 months ago

Oh I got it. Thank you for clarifying.

Wow, It is interesting that the model understand how to inpaint the input with the exemplar only flipping!

I have one more question. I believe we need to always apply $L{lpips}$ loss regardless flipping the image or not since $L{lpips}$ is should be transform invariant.

in the paper, it makes sense to apply $L{lpips}$ only if $I{gt}$ and $I{exe}$ are same. however, your code, as the paper describes, only optimized on $L{lpips}$ loss when rand_num == rand_end

        if rand_num == rand_end:
            g_percept_loss = percept_loss(completed_img, real_img.detach(),
                                          weight_map=mask_weight).sum() * args.percept_loss_weight
            loss_dict["g_percept_loss"] = g_percept_loss
            g_loss+=g_percept_loss
        else:
            loss_dict["g_percept_loss"] = torch.zeros(1).mean().cuda()

Could you clarify this?

Thank you!

LonglongaaaGo commented 6 months ago

Hi @JunseongAHN : Thank you for your insightful queries! I'm happy to clarify each point for you:

On your observation about flipping the image, what we actually do in our model is flip the index of a batch image. This means that, within a batch, the first image gets swapped with the last one, rather than flipping the image itself.

Regarding your question about the application of the Lpips loss: You've raised an important point. Indeed, when rand_num == rand_end, it implies that the exemplar is identical to the real (masked) image, making it appropriate to apply the Lpips loss. This is because, under these conditions, the loss can effectively guide the model in learning from the exemplar-guided inpainting process. On the other hand, when the exemplar is not the same as the real image (i.e., rand_num != rand_end), we leverage other losses. These are designed to implicitly encourage the generator to incorporate facial features from the exemplar.

I hope these explanations clarify your concerns. If you have any more questions, please feel free to ask!

JunseongAHN commented 6 months ago

oh, my bad. it was flipped along the batch axis haha

Now it perfectly makes sense! Thank you for clarifying!

JunseongAHN commented 6 months ago

I have one more question for clarification! I am wondering if you flipped the image on batch axis to get the exemplar image for an implementation simplicity.
was this way helpful for the training procedure??

You mentioned, and I also believe, implementing it as described in the paper is equally valid. However, I think optimizing a model with input and exemplar in the batch is slightly different from that with input and exemplar in the dataset.

LonglongaaaGo commented 6 months ago

Hi @JunseongAHN

Thanks again for your questions. Regarding the flipping within a batch of images, it essentially serves a similar purpose as extracting an exemplar from the dataset. Let me explain how:

  1. Each time we construct a batch, we randomly select images. This ensures that the composition of each batch is unique and different from the previous iterations.
  2. By implementing flipped indexes, we effectively use a different set of exemplars for each masked image in every iteration. This variety in exemplars across iterations enhances the learning process.
  3. It's important to note that in a batch, each image independently influences the model training. Thus, it is the same as the way you mentioned here. I am really happy you can try your way to implement it. See if there are some interesting things that can be explored. 👍
JunseongAHN commented 6 months ago

I greatly appreciate your reply! awesome work! and thank you for sharing your code

LonglongaaaGo commented 6 months ago

Hi @JunseongAHN, Thank you!