eyeglasses boundary changing gender

Wonder1905 commented 3 years ago

Hi, first thanks for sharing your work. I'm running using: python3.6 edit.py -m stylegan_celebahq -b boundaries/stylegan_celebahq_eyeglasses_boundary.npy --n 4 --o results/sg_eg_celebhq

But I'm getting a change in gender, and not gallasses adding: First image:

Final results

It happens also when using pggan, and in all the samples. I'll be happy for any help. Thanks.

ShenYujun commented 3 years ago

PGGAN is trained on CelebA-HQ dataset, which lacks samples wearing eyeglasses. Hence, the boundary may not be that accurate. In other words, PGGAN is not very good at generating data with eyeglasses. Meanwhile, "eyeglasses" entangle with "gender" and "age". That is why you saw the gender (together with age) changes.

alan-ai-learner commented 3 years ago

hey, @ShenYujun I've got a question that In my use case I need to add sunglasses to the face or consider any type of facial attribute. So as you said earlier that "PGGAN is trained on CelebA-HQ dataset, which lacks samples wearing eyeglasses. Hence, the boundary may not be that accurate. In other words, PGGAN is not very good at generating data with eyeglasses. Meanwhile, "eyeglasses" entangle with "gender" and "age". That is why you saw the gender (together with age) changes." Q 1. How can I generate accurate boundaries for any specific task? Q 2. What and where should I made the changes in the code? I need some suggestion! Thanks

ShenYujun commented 3 years ago

To get an accurate boundary, you should make sure the model is able to generate balanced data. This is primarily controlled by the pre-trained model itself. From this perspective, the StyleGAN (or even StyleGAN2) model trained on FF-HQ would be better than the PGGAN model trained on CelebA-HQ. Also, to make the boundary fairer (taking sunglasses in your case as an example), you can (1) generate a large collection of data, (2) scoring them regarding sunglasses, (3) choose balanced data with highest and lowest scores (e.g., make sure the data you choose are half male half female, half young half old, etc. But to make sure this, the data collection is required to be super huge, otherwise, you may not be able to get enough data for training the boundary), (4) train a boundary on the chosen balanced data. Good luck on that.

alan-ai-learner commented 3 years ago

hey @ShenYujun is there any already available trained boundary file for the sunglasses.

ShenYujun commented 3 years ago

Sorry, but no.

alan-ai-learner commented 3 years ago

To get an accurate boundary, you should make sure the model is able to generate balanced data. This is primarily controlled by the pre-trained model itself. From this perspective, the StyleGAN (or even StyleGAN2) model trained on FF-HQ would be better than the PGGAN model trained on CelebA-HQ. Also, to make the boundary fairer (taking sunglasses in your case as an example), you can (1) generate a large collection of data, (2) scoring them regarding sunglasses, (3) choose balanced data with highest and lowest scores (e.g., make sure the data you choose are half male half female, half young half old, etc. But to make sure this, the data collection is required to be super huge, otherwise, you may not be able to get enough data for training the boundary), (4) train a boundary on the chosen balanced data. Good luck on that.

hey @ShenYujun can you please elaborate more about the steps that you have mentioned about creating our own boundaries, like Q1:-for collecting data how much data approximately we need, and all the images in the data set should be wearing sunglasses or we need the pair like the same person with or without sunglasses?

Q2:- as you said "scoring them regarding sunglasses" how exactly the process of scoring, like after collecting the dataset how we needed it?

Q3:- How can we extactly "choose balanced data with highest and lowest scores":?

Thanks .

Daquisu commented 2 years ago

Hello @alan-ai-learner

I'm not one of the authors but they explained this in their CVPR paper:

We train an auxiliary attribute prediction model using the annotations from the CelebA dataset [26] with ResNet- 50 network [18]. This model is trained with multi-task losses to simultaneously predict smile, age, gender, eye- glasses, as well as the 5-point facial landmarks. [...] Given the pre-trained GAN model, we synthesize 500K images by randomly sampling the latent space. There are mainly two reasons in preparing such large-scale data: (i) to eliminate the randomness caused by sampling and make sure the distribution of the latent codes is as expected, and (ii) to get enough wearing-glasses samples, which are really rare in PGGAN model. [...] To find the semantic boundaries in the latent space, we use the pre-trained attribute prediction model to assign attribute scores for all 500K synthesized images. For each attribute, we sort the corresponding scores, and choose 10K samples with highest scores and 10K with lowest ones as candidates.

So answering directly:

Q1.1) for collecting data how much data approximately we need?

They generated 500K images and used the most extreme 20k examples, 10k for each side (eg oldest and youngest 10k). If you want boundaries with similar qualities as the ones the authors provided, this is a good guess for the quantity.

Q1.2) and all the images in the data set should be wearing sunglasses or we need the pair like the same person with or without sunglasses?

Don't need to be this kind of pair. They synthesized 500K images by randomly sampling the latent space, thus they had no control over what was being generated .

Q2) as you said "scoring them regarding sunglasses" how exactly the process of scoring, like after collecting the dataset how we needed it?

The classifier (attribute prediction model) for eg sunglasses will generate a probability that the person in the image is using sunglasses. The higher this probability, then the more certain the model is that this person is indeed using sunglasses. The lower, then the more certain the model is that this person is not using sunglasses. The most extreme scores will be hence the images whose model output was the closest to 0 or to 1

How can we extactly "choose balanced data with highest and lowest scores":?

Q3) By getting 10k images for each side (eg oldest and youngest 10k) we are already getting a balanced data: we expect that half of them will be young people and the another half old people.

genforce / interfacegan

eyeglasses boundary changing gender #59