bluestyle97 / STGAN-pytorch

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing
89 stars 21 forks source link

Why this reimplementation use bias==False in all CONV layers? #1

Closed dypromise closed 5 years ago

dypromise commented 5 years ago

Hi, bluestyle97! Thanks for your nice pytorch reimplementation! It more faster than official version. But, i found some difference: 1. conv without bias . 2. target label has a randmom coefficient multiplied. Could you explain that? I am confusing about why you did this. thank you very much!

bluestyle97 commented 5 years ago

Hi, thanks for your comment! For the first question, 'bias' will only influence the mean value of the convolutional layer's output. So 'bias' can be ignored when the convolutional layer (nn.Conv2d) is followed by a batch-normalization layer, since the mean shift will be removed in the batchnorm layer. You can refer to the following links for more explanation: https://discuss.pytorch.org/t/any-purpose-to-set-bias-false-in-densenet-torchvision/22067
https://github.com/kuangliu/pytorch-cifar/issues/52

For the second question, STGAN is exactly based on the work of AttGAN, and in AttGAN they use a mechanism to control the attribute manipulation intensity by make the target vector uniformly lie on [-1, 1] while training. You can read the AttGAN paper and its implementation for more details: https://arxiv.org/abs/1711.10678 https://github.com/elvisyjlin/AttGAN-PyTorch

dypromise commented 5 years ago

Wow!! Thank you very much, it helps me A LOT!!!

Hi, thanks for your comment! For the first question, 'bias' will only influence the mean value of the convolutional layer's output. So 'bias' can be ignored when the convolutional layer (nn.Conv2d) is followed by a batch-normalization layer, since the mean shift will be removed in the batchnorm layer. You can refer to the following links for more explanation: https://discuss.pytorch.org/t/any-purpose-to-set-bias-false-in-densenet-torchvision/22067 kuangliu/pytorch-cifar#52

For the second question, STGAN is exactly based on the work of AttGAN, and in AttGAN they use a mechanism to control the attribute manipulation intensity by make the target vector uniformly lie on [-1, 1] while training. You can read the AttGAN paper and its implementation for more details: https://arxiv.org/abs/1711.10678 https://github.com/elvisyjlin/AttGAN-PyTorch

dypromise commented 5 years ago

Hi, I noticed that another differences: your version don't use inject layers and just use 3 stu layers. I modify it to use inject layers in decoder and use 4 shortcut layers and found that it is difficult to converge. Did you try this ? If so, could you give me some suggestions on training? following is my parameters: exp_name: stgan model_name: stgan mode: train cuda: true ngpu: 4

data

dataset: celeba data_root: /dockerdata/home/rpf/rpf/xmmtyding/celeba_data/crop384/img_crop_celeba_png/ att_list_file: /dockerdata/home/rpf/rpf/xmmtyding/celeba_data/crop384/new_list_attr_celeba_addhair.txt crop_size: 384 image_size: 384

model

g_conv_dim: 48 d_conv_dim: 48 d_fc_dim: 512 g_layers: 5 d_layers: 5 shortcut_layers: 4 stu_kernel_size: 3 use_stu: true one_more_conv: true attrs: [Bangs, Black_Hair, Blond_Hair, Brown_Hair, Bushy_Eyebrows, Eyeglasses, Male, Mouth_Slightly_Open, Mustache, No_Beard, Pale_Skin, Young, HairLength] checkpoint: ~

training

batch_size: 64 beta1: 0.5 beta2: 0.5 g_lr: 0.0008 d_lr: 0.0008 n_critic: 5 thres_int: 0.5 lambda_gp: 10 lambda1: 1 lambda2: 10 lambda3: 100 max_iters: 1000000 lr_decay_iters: 800000

steps:

summary_step: 10 sample_step: 2500 checkpoint_step: 2500