Observations on eyeglasses and textures

yaseryacoob commented 3 years ago

Thanks for sharing your ideas and code. It is rather fun to compare it to StyleGAN2. I am wondering about this

Why does your algorithm do poorly with eyeglasses?
There is a certain blockiness to the images (almost like JPEG artifacts) Not sure why it is more common.
I would have expected hair to be better rendered with your architecture, but for some odd reason (especially facial hair) is more iffy. Almost like too much regularity to the wavelet directions.

Thanks if you have any information to share. It is an interesting architecture you propose, so I missing intuition behind it.

bes-dev commented 3 years ago

@yaseryacoob hey, thanks for you feedback!

Yes, we observed problems with glasses generation—detailed exploration of this problem we have postponed for further work. The main hypothesis about this problem is that this behavior can be related to "undersampling" problem. So, by design, our pipeline generates training data on the fly using a teacher network. If the probability of face with glasses is small, it will be hard to generalization for this class of samples.

What about 2-3, I think this problem related to "underfitting" of the network in common. In the last stages of the training, we observed some symptoms of "overfitting" the discriminator network that leads to stop training. It was partially fixed by using differentiable augmentations, but some artifacts still exist on the generated images.

Yes, we noticed the same problems in our model. But our work just the first step to fast production-ready style-based generative models. We will try to fix the known issues in further works and make the final quality more closure to the original network.

yaseryacoob commented 3 years ago

I figured you must have seen these, and I fully understand the complexity of research code and also the challenge. I actually mean to be encouraging of the approach and architecture overall. I will look into it as well (as much as I can within your code to see what else may enable improving the generation. Here is another question, for the example above I used compare.py which inject a 512 latent, can I inject a W+ 18x512 latent instead? (obviously StyleGAN2 has no issue with that, but from the paper I couldn't tell if your architecture is more than W. Can you clarify?

Also, you can send me email directly to yaser@umd.edu if you rather not discuss in public.

bes-dev commented 3 years ago

@yaseryacoob I hope my code is more "engineering" than "research" because I'm an engineer, not a research scientist in general 😂 But if you find it difficult, please feel free to write about it and I'll try to explain it. It will be great if my work helps to further research in GANs!

What about latent space: W+ space isn't equal between StyleGAN2 and MobileStyleGAN because: 1) MobileStyleGAN has one building block less than StyleGAN2 (the reason is related to using wavelet-domain and had been described in the article). 2) Building block of MobileStyleGAN has more skip connections to style than in StyleGAN2. You can look at the difference between building blocks for StyleGAN2 and MobileStyleGAN in code.

yaseryacoob commented 3 years ago

Actually research and engineering blend when significant architecture changes are made. The changes you made can and should lead to different outcomes from StyleGAN2, but there are a number of issues, like the students/teacher framework. StyleGAN2 is of coarse a good teacher, but when does it stop and a student better than the teacher come to exist is a matter of deeper analysis. I obviously can't tell without further tinkering with the code if the student is better than the teacher given the architecture. The example above shows that given a specific W, you a bit less than StyleGAN2, but it doesn't mean that a W+delta can or can't match or beat StyleGAN2. Delta can be optimized or learned by an encoder. There are some interesting questions to answer.

bes-dev commented 3 years ago

@yaseryacoob It's pretty cool idea is to train external networks that will predict the delta between student and teacher! If it works and computational complexity for both networks will be less than a teacher, it will be pretty results. I saw the same ideas about the iterative estimation of the output image in ReStyle paper: https://yuval-alaluf.github.io/restyle-encoder/ 🤔

yaseryacoob commented 3 years ago

Yeah I forgot about the restyle approach! I had discussed things with them last week. You can see the thread there. We are missing the last 5-10% in quality, and I am not sure where it will come from, architecture/latent representation/optimization or god knows what. So testing your architecture by pushing it to the 100% is worth trying so we learn where the potential lies.

My gut feeling is that the architecture is got to be the first hurdle that needs improvement to perfectly capture all the frequencies adequately, it has to be just "right". I can't prove it for now... I was hoping your experiment will teach us something.

bes-dev commented 3 years ago

@yaseryacoob I added an experiment with iterative refinement to my to-do list. I'll try it when I have free time :) It will be great if you research our work deeper and try to improve it too 💪

bes-dev / MobileStyleGAN.pytorch

Observations on eyeglasses and textures #2