Exploration

The element of a group giving the best results is marked in bold. If multiple elements are used together, all of them are marked. \ If a minor positive changes were noted in the early training, an item is marked using cursive text. However, it doesn't mean that it's competitor network can't overcome those slight disadvantages.\ As it's a GAN, absolute performance can not be measured using accuracy and FID and other metrics would be a waste of time for GAN-breaking changes. Therefore a scale from breaking (--) to stabilizing/drastically improving performance (++) is created, with 0 symbolizing indifference. Note that a breaking change blocks the model from converging any further than generating blobs of the wrong colour, but does not necessarily lead to pure noise.\ All is compared to the baseline DCGAN.

Name	Rating
Convolutional Activations
LeakyRelu (0.01)	0
PReLU	0
ReLU	-
RootTanh	++
Swish	+
Tanh	--
TanhShrink	--
Linear Activations
RootTanh	++
Sigmoid	-
Softmax	--
Tanh	0
TanhShrink	--
Convolution Types
Spatial Factorization	-
Depthwise Separable	-
Inception v1	-
Inception-v3 A	-
Inception-v3 C	--
Selective Kernel	--
Locally Connected	--
Block Connection
DenseNet	-
Convolution	--
ResNet	+
RevNet	-
Gated ResNet	++
Additional Blocks
Average Attention	--
Feature Attention	+
SelfAttention	+
Normalization Layer
None	-
Adaptive Instance Norm	--
BatchNorm	0
Adaptive Batch Norm	+
Instance Norm	-
Layer Norm	-
Regularization
None	0
Consistency Regularization	++
Gradient Penalty	+
Spectral Norm	+
Optimizers
Adabound	--
Adam	0
NAdam	0
RMSprop	-
SGD	--
Optimizer Parameters
Beta1 = 0.9	-
Beta1 = 0.5	0
Beta1 = 0.2	-
Beta1 = 0.0	-
Beta2 = 0.99	0
Beta2 = 0.9	0
Beta2 = 0.5	-
Gen LR = 1e-3	-
Gen LR = 5e-4	+
Gen LR = 1e-4	0
Gen LR = 1e-5	-
Dis LR = 1e-2	--
Dis LR = 2e-3	0
Dis LR = 1e-3	-
Dis LR = 1e-4	-
Architectural Properties
Deep (Residual)	0
Deep (Non-Residual)	--
Wide	+
Bottleneck	0
Small Kernels (3x3)	0
Medium Kernels (5x5)	0
Large Kernels (7x7)	+
Training Parameters
Hinge Loss	++
Wasserstein Loss	+
Binary Crossentropy	0
Small Batchsize (16)	--
Standard Batchsize (128)	0
Large Batchsize (256)	-
Increasing Batchsize	+

Doing some exploration using the DCGAN Tutorial as a base, I was able to find out that 1.1) Hinge Loss makes the d_error drop to 0 if no penalty is used 1.2) If consistency regularization is used (even at gamma=10), the discriminator converges significantly slower than it does with BCEWithLogitsLoss 2) After 1000 updates using RootTanh plugged in to the default DCGAN, the generator already is stronger than it would be after 7000 updates as a ResNet (residual path=scale, module path=5x5/4x4 followed by 3x3 (performs better than 1x1))

EDIT: It's the penalty. It converges differently, without any noticeable speed difference, but significantly more stable. However when adding the penalty, the convergence slows down a lot.

ClashLuke / LocAtE

Exploration #11

Exploration