EleutherAI / vqgan-clip

MIT License
351 stars 40 forks source link

Improving the augmentation pipeline #7

Open DanJbk opened 3 years ago

DanJbk commented 3 years ago

Currently the code contains several presets for the "MakeCutouts" class and augmentation pipeline. We should consider removing some of them and settling on a single preset.

From my personal experience and tests, the following transformations improve generation only for certain prompts, while degrading others: RandomSharpness, RandomElasticTransform, RandomThinPlateSpline.

These augmentations are in the class 'MakeCutoutsNRUpdate' https://github.com/EleutherAI/vqgan-clip/blob/9eb2039060836034a2a45f8eb68bdf16c95b7b08/src/masking.py#L178 and "MakeCutouts" https://github.com/EleutherAI/vqgan-clip/blob/9eb2039060836034a2a45f8eb68bdf16c95b7b08/src/masking.py#L80

These contains other redundant augmentations like "RandomResizedCrop", "RandomCrop", and "RandomGaussianNoise" (which has a similar effect to noise_fac on line 227)

I don't think we've tested "RandomErasing" augmentation before. additionally, "RandomHorizontalFlip" is missing. It does create a bias for certain kind of results, but it is still a useful augmentation and often forces better symmetry when generating faces.

StellaAthena commented 3 years ago

Currently the code contains several presets for the "MakeCutouts" class and augmentation pipeline. We should consider removing some of them and settling on a single preset.

I agree. Based on your experience, what is the best augmentations to use for a generic prompt? @crowsonkb? @neverix?

DanJbk commented 3 years ago

I agree. Based on your experience, what is the best augmentations to use for a generic prompt? @crowsonkb? @neverix?

I am using these augmentations.

K.RandomHorizontalFlip(p=0.5),
K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), 
K.RandomPerspective(0.2,p=0.4),
K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),`

and noise_fac of 0.1