Training error - Githubissues

bycloudai commented 4 years ago

Hello again, I got this training error when running "train.py", how can I solve this?

(hiface) G:\HiFaceGAN\Face-Renovation-master>python train.py
train.py
dataset [TrainDataset] of size 7 was created
Network [HiFaceGANGenerator] was created. Total number of parameters: 128.0 million. To see the architecture, do print(network).
Network [MultiscaleDiscriminator] was created. Total number of parameters: 5.5 million. To see the architecture, do print(network).
create web directory ./checkpoints\exp1\web...
Traceback (most recent call last):
  File "train.py", line 93, in <module>
    main()
  File "train.py", line 52, in main
    trainer.run_generator_one_step(data_i)
  File "G:\HiFaceGAN\Face-Renovation-master\trainers\pix2pix_trainer.py", line 34, in run_generator_one_step
    g_losses, generated = self.pix2pix_model(data, mode='generator')
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\parallel\data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\HiFaceGAN\Face-Renovation-master\models\pix2pix_model.py", line 47, in forward
    g_loss, generated = self.compute_generator_loss(input_semantics, real_image)
  File "G:\HiFaceGAN\Face-Renovation-master\models\pix2pix_model.py", line 74, in compute_generator_loss
    fake_image = self.generate_fake(input_semantics)
  File "G:\HiFaceGAN\Face-Renovation-master\models\pix2pix_model.py", line 120, in generate_fake
    fake_image = self.netG(input_semantics)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\HiFaceGAN\Face-Renovation-master\models\networks\generator.py", line 238, in forward
    x = self.head_0(x, xs[0])
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\HiFaceGAN\Face-Renovation-master\models\networks\architecture.py", line 55, in forward
    dx = self.conv_0(self.actvn(self.norm_0(x, seg)))
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\HiFaceGAN\Face-Renovation-master\models\networks\normalization.py", line 100, in forward
    actv = self.mlp_shared(segmap)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\conv.py", line 353, in forward
    return self._conv_forward(input, self.weight)
  File "E:\Anaconda3\envs\hiface\lib\site-packages\torch\nn\modules\conv.py", line 350, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [128, 768, 3, 3], expected input[2, 1024, 4, 4] to have 768 channels, but got 1024 channels instead

I've done exactly what you described for degrade.py, input 512x512 image and produce a paired image which is 512x1024.

Here's the training config

class TrainOptions(object):
    dataroot = './training_t_full/'
    dataroot_assist = ''
    name = 'exp1'
    crop_size = 512

    gpu_ids = [0]  # set to [] for CPU-only training (not tested)
    gan_mode = 'ls'

    continue_train = False
    which_epoch = 'latest'

    D_steps_per_G = 1
    aspect_ratio = 1.0
    batchSize = 2
    beta1 = 0.0
    beta2 = 0.9
    cache_filelist_read = True
    cache_filelist_write = True
    checkpoints_dir = './checkpoints'
    choose_pair = [0, 1]
    coco_no_portraits = False
    contain_dontcare_label = False

    dataset_mode = 'train'
    debug = False
    display_freq = 100
    display_winsize = 256
    print_freq = 100
    save_epoch_freq = 1
    save_latest_freq = 5000

    init_type = 'xavier'
    init_variance = 0.02
    isTrain = True
    is_test = False

    semantic_nc = 3
    label_nc = 3
    output_nc = 3
    lambda_feat = 10.0
    lambda_kld = 0.05
    lambda_vgg = 10.0
    load_from_opt_file = False
    lr = 0.0002
    max_dataset_size = sys.maxsize
    model = 'pix2pix'
    nThreads = 2

    n_layers_D = 4
    num_D = 2
    ndf = 64
    nef = 16
    netD = 'multiscale'
    netD_subarch = 'n_layer'
    netG = 'hifacegan'  # spade, lipspade
    ngf = 64  # set to 48 for Titan X 12GB card
    niter = 30
    niter_decay = 20
    no_TTUR = False
    no_flip = False
    no_ganFeat_loss = False
    no_html = False
    no_instance = True
    no_pairing_check = False
    no_vgg_loss = False

    norm_D = 'spectralinstance'
    norm_E = 'spectralinstance'
    norm_G = 'spectralspadesyncbatch3x3'

    num_upsampling_layers = 'normal'
    optimizer = 'adam'
    phase = 'train'
    prd_resize = 512
    preprocess_mode = 'resize_and_crop'

    serial_batches = False
    tf_log = False
    train_phase = 3  # progressive training disabled (set initial phase to 0 to enable it)
    # 20200211
    #max_train_phase = 2 # default 3 (4x)
    max_train_phase = 3
    # training 1024*1024 is also possible, just turning this to 4 and add more layers in generator.
    upsample_phase_epoch_fq = 5
    use_vae = False
    z_dim = 256

thank you!

darkcake commented 4 years ago

Have same error. Please help!

kwea123 commented 4 years ago

Change this parameter to 48: https://github.com/Lotayou/Face-Renovation/blob/b5f5e9e86ec8ebdebcf64d22d806085fead80b8b/options/config_hifacegan.py#L59 It controls the input channel, 48 means 48x16=768 channels. https://github.com/Lotayou/Face-Renovation/blob/b5f5e9e86ec8ebdebcf64d22d806085fead80b8b/models/networks/generator.py#L195-L206

The training will work in this case, but I don't know if the result will be good... The code seems too messy...

Lotayou commented 4 years ago

@kwea123 Thanks for mentioning. The code has now been reformatted and should work for both 48 and 64. https://github.com/Lotayou/Face-Renovation/blob/2e55e1563d52c6cfb2bbd98c915ae95f566f416d/models/networks/generator.py#L195-L206

Most experiments reported in the paper are trained with ngf=48 that fits on a 12GB Titan X card, and the result is already good enough to beat SOTA. The final code release is tested on a company server with 16GB P100 card, allowing training with ngf=64. I haven't benchmarked the performance between 48 and 64 though.

Lotayou / Face-Renovation

Training error #11