NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.45k stars 1.14k forks source link

Unstable Training + Aliasing on Flower Petal Dataset #81

Open CaptainStiggz opened 2 years ago

CaptainStiggz commented 2 years ago

I'm seeing training issues when training a dataset of ~90k images of flower petals. These images were originally RAW files that were processed and exported from Adobe Lightroom. The images were then center-cropped, and resized to 1024x1024 using cv2. Finally, they were run through the dataset_tool.py before training. All images were .jpg before being run through the dataset tool.

I then trained using the recommended training configuration: --cfg=stylegan3-r --gpus=8 --batch=32 --gamma=32

Training tends to look good for the first 4000kimg, at which point it blows up and becomes unstable. I typically see the G/D loss functions go crazy, followed by the FID score blowing up and the fake images generated regressing to an earlier point in the training. I've had a little bit of luck training from a stable point with reduced gamma values in order to improve the FID, but no matter what I try, it always blows up eventually.

Attached are some example images from my dataset, and a training graph showing FID/loss.

Sample Dataset Images: reals

Sample Output Image - Before Instability - 3600 kimg fakes003600

Training Graphs

Screen Shot 2021-12-04 at 12 24 17 PM Screen Shot 2021-12-04 at 12 24 24 PM

I saw a comment in #77 that cv2 resizing might be causing aliasing artifacts that the network learns - which might explain some of the instability.

I also selected stylegan-r as the training configuration, since the a rotated petal is still a petal! However, maybe I should be training with a different configuration?

leesky1c commented 2 years ago

Hi Z, have you solved this problem?

CaptainStiggz commented 2 years ago

Hi Z, have you solved this problem?

Nope, I never solved it. I was able to get a sufficiently low FID for my purposes, although it leaves much to be desired. I would love to know what else I could try to overcome this issue.