NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.3k stars 1.11k forks source link

Is this collapsed? Transfer learning from FFHQ-U with 2500 images #134

Open botoxparty opened 2 years ago

botoxparty commented 2 years ago

Hey,

I am attempting to do some transfer learning on the FFHQ-U model with a dataset that has 2500 images (5000 mirrored) Only 64 kimgs into the training my results look like this:

It looks like mode collapse to me, should I continue to train?

image
PDillis commented 2 years ago

Yes, it looks heavily collapsed, although it also helps if you show us a sample of your real dataset to compare with. What settings did you use to train the model? (resolution, config, batch size, learning rate, gamma, ...) This way we can perhaps suggest better alternatives, if they are needed.

botoxparty commented 2 years ago

@PDillis Thanks for such a prompt response!

So these are the training parameters i'm using: (Training on a single 3090 24GB)

resolution = 1024
batch_size = 32
batch_gpu_size = 8
learning_rate = 0.0002
gamma_value = 10.0
resume_from = 'https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhqu-1024x1024.pkl'

And these are some samples of my training data: 43_ON_0432 1366x2048 00043-Balenciaga-Resort-20 00040m

PDillis commented 2 years ago

Have you tried StyleGAN2-ADA before? The problem I think of using StyleGAN3 config-t is that the translation equivariance won't be really used here, as the models have different heights but the white bars on the sides are always at the same width. Basically, you have an aligned dataset, which should do quite well with StyleGAN2-ADA (--cfg=stylegan2). If you have tried StyleGAN2-ADA before, did you also try with black instead of white bars to the sides?

Apart from that, perhaps you should consider increasing --gamma, as that will limit how expressive the generated images are, but at least it will converge to a model walking on the runway. Later you can then turn down the value of gamma in order to increase how expressive it is. So, try --gamma=100 or something first and see if that helps.

Everything else seems fine, though since you're in the low-data range for GANs, you should focus on the options for ADA. Say you stay with StyleGAN3, then by default, you have: --aug=ada --augpipe=bgc --target=0.6. I would recommend setting --target=0.8 since you have less than 10k images; for reference, the figure in the ADA paper, where you want the lowest value of FID:

image

From the previous figure, the color augmentation rarely helps, so I would also suggest trying with --augpipe=bg. Now, this is not a setting you can change as the bgc augpipe is hardcoded in here. So, I have added it to my repo and I will shamelessly self-promote it here: https://github.com/PDillis/stylegan3-fun

In it, you can also do the vertical black bars I told you about before, but honestly change one thing at a time: start with augpipe and target, then if that doesn't work change the gamma, then on to the vertical black bars and using StyleGAN2-ADA.

botoxparty commented 2 years ago

Thanks a million @PDillis , I am trying those out right now. Will report back in a couple of days with some results :)

I'm pretty new to GANs, neural networks and AI in general... Are there any resources that you would recommend to help me understand how these parameters affect training a model? Would be much appreciated.

P.s. I ran into errors creating directories when initialising the training because of this change in your repo. Reverting it back worked fine.

PDillis commented 2 years ago

Honestly, the best to do is to read the papers, as they go full in-depth on the details of everything. The other option is more practical, i.e., look at Derrick Schultz's videos on how to train StyleGAN and the meaning of each parameter: https://youtu.be/DVXX0tmVyco

Thanks for the error, I'll look into it. Let me know how your training goes.

botoxparty commented 2 years ago

So just an update, I found that the results using transfer learning were not as good as training from scratch even though the dataset is limited.

I also discovered that not all my image are aligned perfectly so using the translate variant is actually helpful in my case.

I am going to retrain from scratch using the suggested parameters and see how I go from there. Will share results and comparisons as soon as they're ready :)

GHBTM commented 2 years ago

@botoxparty did you get reasonably good results? I am using SG3 on a 1024x1024 dataset (20,000 images), after successfully training SG2ADA on a 512x512 toyset. I'm also noticing very little diversity in the first couple hours (600+ kimgs). I would suspect its mode collapse, although ALL of the snapshots seem to have about the same limited vatiation.

botoxparty commented 2 years ago

@GHBTM I gave up on transfer learning as I was getting much better results just training from scratch.

Great-Bucket commented 1 year ago

Thank you for this post. I found it very helpful.