Question on Output dimension of Generator

junikkoma commented 3 years ago

Hello! I would first like to thank you for sharing codes for such wonderful work. I have some questions on intermediate output dimension of feature maps in the generator.

According to Figure 16, I presumed output dimension of f (φ1) should be (1024, height, width), as the first 1x1 convolution is noted as DeConv(2^(11-i), 1, 1). To my understanding, it would not match noise input dimension of (512,4,4).

Then on examining your module.py code, the output dimension was controlled by nd() function, which has upper limit of 512 unlike noted output dimension in figure 16. I presumed such function was introduced to solve the dimension mismatch problem mentioned above.

In addition, i noticed a outlier in dimension of orthonormal basis. To my understanding, dimensions of orthonormal basis uij ∈ R^(Hi×Wi×Ci) should double as it moves up one layer, as height and width values are doubled whereas Ci value is halved. However, as nd(height) is a parameter determining dimension of U on module.py, I noticed dimension quadruples on moving from layer 1 to layer 2 (8192->32768), unlike expected behavior mentioned above.

Regarding aforementioned issues, I would like to ask questions below.

I would like to ask whether nd() function was introduced just to match the dimension, or there is other training gains of using such output dimensions.
I am wondering if there is any test results on setting initial noise dimension to 1024, i.e eps_dim = 1024 so the generator architecture would work as given on figure 16.
I am curious whether I have understood correctly about dimension of orthonormal basis.

If there is anything which I have misunderstood, please kindly point it out. Thank you for your kind attention.

Attached below is toy code i used to estimate dimensions of intermediate feature map

zs = [tf.random.normal([64, z_dim]) for z_dim in [6] * 6]
h = tf.random.normal([64, 4 * 4 * 512])
h = tf.reshape(h, [-1, 4, 4, 512])

nd = lambda size: min(int(2**(12 - np.log2(size))), 512)
# nd = lambda size : 2048//size

print(f'Noise shape : {h[0].shape}')

for i, z in enumerate(zs):
    height = width = 4 * 2 ** i
#     print(i,height,z.shape[-1])
    U = tf.compat.v1.get_variable(f'U_{i}',initializer=tf.initializers.orthogonal(),shape=[height, width, nd(height), z.shape[-1]])
    L = tf.compat.v1.get_variable(f'L_{i}',shape=[z.shape[-1]],initializer=tf.initializers.constant([3 * i for i in range(z.shape[-1], 0, -1)]))
    mu = tf.compat.v1.get_variable(f'mu_{i}',shape=[height, width, nd(height)],initializer=tf.initializers.zeros())
    print(f'basis dimension : {tf.reshape(U[:,:,:,0],-1).shape}')

    h_ = tf.reduce_sum(U[None, ...] * (L[None, :] * z)[:, None, None, None, :], axis=-1) + mu[None, ...]
    h1 = transposed_convolution2d(h_, num_outputs = nd(height), kernel_size = 1) #deconv
    h2 = transposed_convolution2d(h_, num_outputs = nd(height*2), kernel_size = 3, stride = 2)

    h0 = transposed_convolution2d((h+h1),nd(height * 2), 3, 2)
    print(h1.shape, h2.shape, h0.shape)
    h = transposed_convolution2d((h0+h2),nd(height * 2), 3, 1)
#     print(h_.shape, h1.shape,h2.shape)
    print(f'output shape of layer {i+1} : {h[0].shape}')
print(f'final result :{convolution(h,num_outputs=3,kernel_size = 7)[0].shape}') #conv

LynnHo commented 3 years ago

@junikkoma

nd() is used to control the channel number according to the spatial size of the feature map.
It's my mistake, the first 1x1 convolution in Fig.6 should be Conv(512, 1, 1) because we restrict the maximum channels to 512. We never use 1024 as the noise dimension. Thank you very much for finding out this mistake!
Your understanding of the dimension of orthonormal basis is correct. The dimension of Uij is height * width * nd(height), so it should double at each layer. However, we restrict the maximum channels to 512, therefore, nd() results of layer 1 and layer 2 are the same (512), then the dimension quadruples from 1 to 2.

junikkoma commented 3 years ago

Thanks for your detailed and quick response!

LynnHo / EigenGAN-Tensorflow

Question on Output dimension of Generator #3