It seems training will collapse when applying another Dataset

schoengzc commented 5 years ago

I have trained the model using another content dataset with the given styles images of Monet, but it seems it will soon collapse and output entirely black stylized images. I've tried discarding the image augmentation process and scipy.misc.imresize(), but it still can not work with this content dataset (150,000 jpg images with generally 1800+ pixels). Would you please give me some tips or suggestions about this issue? Such as trying another learning rate/discriminator success rate. Thanks for your time in advance.

dimakot55 commented 5 years ago

We also have experienced similar issues for some datasets. To the best of my knowledge this is caused by either 1) some numerical instabilty of the loss function self.loss = sce_criterion in model.py (sce_criterion is defined in module.py) which relies on computing tf.nn.sigmoid_cross_entropy_with_logits involving computation exp(x). But this is unlikely since TF team have checked such a common in practice function. 2) More likely cause to my opinion is the overflow of some convolutional kernel weight somewhere inside the network which corrupts all the other weights in single update step. Frankly, I'm still not sure about this explanation.

We've noticed that this happens when training on style datasets of especially complicated artists (those where local structure and texture is less prominent, but the painting composition and content is what matters the most) Easy trick that helped us with this issue - restart training from the last saved step model hasn't corrupt yet.

schoengzc commented 5 years ago

Thanks very much for your detailed reply. I have to say ,BTW, the performance of your style transfer is amazingly good, I hope I could learn more and solve the unstable training issues in some day.

narayansundararajan123 commented 5 years ago

We also have experienced similar issues for some datasets. To the best of my knowledge this is caused by either

some numerical instabilty of the loss function self.loss = sce_criterion in model.py (sce_criterion is defined in module.py) which relies on computing tf.nn.sigmoid_cross_entropy_with_logits involving computation exp(x). But this is unlikely since TF team have checked such a common in practice function.

More likely cause to my opinion is the overflow of some convolutional kernel weight somewhere inside the network which corrupts all the other weights in single update step. Frankly, I'm still not sure about this explanation.

We've noticed that this happens when training on style datasets of especially complicated artists (those where local structure and texture is less prominent, but the painting composition and content is what matters the most) Easy trick that helped us with this issue - restart training from the last saved step model hasn't corrupt yet.

I am running into this same issue of the output being entirely black stylized images when I train using the art style dataset of 144 black and white paintings of different sizes. Used --image_size=256 due to limitations of my hw. Ran the training to 30000 iterations. Would really appreciate some help esp. on how to restart training from the point of corruption. Or other things that would be good to try. Thanks much.

eps696 commented 5 years ago

i've managed to solve collapsed black output problem by:

1) adding dropouts to residual blocks (on training phase only):

        def residual_block(x, dim, k=3, s=1, dropout=0, name='res'):
            . . . 
            if dropout > 0: y = tf.nn.dropout(y, keep_prob = 1-dropout)
            return y + x

        # stack 9 residual blocks
        nf = features.get_shape().as_list()[-1]
        r1 = residual_block(features, nf, dropout=0,       name='g_r1')
        r2 = residual_block(r1,       nf, dropout=dropout, name='g_r2')
        r3 = residual_block(r2,       nf, dropout=dropout, name='g_r3')
        r4 = residual_block(r3,       nf, dropout=dropout, name='g_r4')
        r5 = residual_block(r4,       nf, dropout=dropout, name='g_r5')
        r6 = residual_block(r5,       nf, dropout=dropout, name='g_r6')
        r7 = residual_block(r6,       nf, dropout=dropout, name='g_r7')
        r8 = residual_block(r7,       nf, dropout=dropout, name='g_r8')
        r9 = residual_block(r8,       nf, dropout=0,       name='g_r9')

2) adding progressive soft labels to discriminator losses:

        def ones(x, key):
            return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
        def zeros(x, key):
            return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)

        # Discriminator losses - ones for original styles, otherwise zero
        in_style_D_loss  = {key: loss(pred,  ones(pred, key)) * s_weight[key] for key, pred in zip(in_style_D_pred.keys(),  in_style_D_pred.values())}
        in_content_D_loss  = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(in_content_D_pred.keys(),  in_content_D_pred.values())}
        out_content_D_loss = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(out_content_D_pred.keys(), out_content_D_pred.values())}

i also removed winrate-based training schedule for now (left just one G and one D pass, no accuracy calculation), but will check again later if it was an issue

narayansundararajan123 commented 5 years ago

i've managed to solve collapsed black output problem by:

adding dropouts to residual blocks (on training phase only):

        def residual_block(x, dim, k=3, s=1, dropout=0, name='res'):
            . . . 
            if dropout > 0: y = tf.nn.dropout(y, keep_prob = 1-dropout)
            return y + x

        # stack 9 residual blocks
        nf = features.get_shape().as_list()[-1]
        r1 = residual_block(features, nf, dropout=0,       name='g_r1')
        r2 = residual_block(r1,       nf, dropout=dropout, name='g_r2')
        r3 = residual_block(r2,       nf, dropout=dropout, name='g_r3')
        r4 = residual_block(r3,       nf, dropout=dropout, name='g_r4')
        r5 = residual_block(r4,       nf, dropout=dropout, name='g_r5')
        r6 = residual_block(r5,       nf, dropout=dropout, name='g_r6')
        r7 = residual_block(r6,       nf, dropout=dropout, name='g_r7')
        r8 = residual_block(r7,       nf, dropout=dropout, name='g_r8')
        r9 = residual_block(r8,       nf, dropout=0,       name='g_r9')

adding progressive soft labels to discriminator losses:

        def ones(x, key):
            return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
        def zeros(x, key):
            return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)

        # Discriminator losses - ones for original styles, otherwise zero
        in_style_D_loss  = {key: loss(pred,  ones(pred, key)) * s_weight[key] for key, pred in zip(in_style_D_pred.keys(),  in_style_D_pred.values())}
        in_content_D_loss  = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(in_content_D_pred.keys(),  in_content_D_pred.values())}
        out_content_D_loss = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(out_content_D_pred.keys(), out_content_D_pred.values())}

i also removed winrate-based training schedule for now (left just one G and one D pass, no accuracy calculation), but will check again later if it was an issue

great! possible to share the modified files for me to quickly rerun the training on my art dataset to see if it might work please?

eps696 commented 5 years ago

@narayansundararajan123 well, i've quite refactored the whole code in a way that i'm more used to, so it's rather different from the original repo now - including names, vars, module structures, utility functions, etc. i will try to apply the same changes to the original code and post those pieces, if the snippets in the post above are not enough. alas, i don't really use git, so cannot provide proper fork..

applied changes are kind of standard GAN tricks to 'slow down' or 'distract' discriminator when it's trained much faster than generator (which is the reason of collapsing - that's quite well seen on the D losses behaviour in tensorboard).

and btw i also totally removed all accuracy calculation and winrate-based training schedule part, cause the model never converged with it (and perfectly did without).

eps696 commented 5 years ago

@narayansundararajan123 ok, let's try these quick updates for original code:

module.py, in decoder()

        def residule_block(x, dim, ks=3, s=1, dropout=False, name='res'):
            p = int((ks - 1) / 2)
            y = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c1'), name+'_bn1')
            y = tf.pad(tf.nn.relu(y), [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c2'), name+'_bn2')
            if dropout is True and options.is_training is True: 
                y = tf.nn.dropout(y, 0.5)
            return y + x

        # Now stack 9 residual blocks
        num_kernels = features.get_shape().as_list()[-1]
        r1 = residule_block(features, num_kernels, name='g_r1')
        r2 = residule_block(r1, num_kernels, dropout=True, name='g_r2')
        r3 = residule_block(r2, num_kernels, dropout=True, name='g_r3')
        r4 = residule_block(r3, num_kernels, dropout=True, name='g_r4')
        r5 = residule_block(r4, num_kernels, dropout=True, name='g_r5')
        r6 = residule_block(r5, num_kernels, dropout=True, name='g_r6')
        r7 = residule_block(r6, num_kernels, dropout=True, name='g_r7')
        r8 = residule_block(r7, num_kernels, dropout=True, name='g_r8')
        r9 = residule_block(r8, num_kernels, name='g_r9')

model.py, in _build_model()

            def ones(x, key):
                return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            def zeros(x, key):
                return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)

            self.input_painting_discr_loss = {key: self.loss(pred, ones(pred, key)) * scale_weight[key]
                                              for key, pred in zip(self.input_painting_discr_predictions.keys(),
                                                                   self.input_painting_discr_predictions.values())}
            self.input_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                           for key, pred in zip(self.input_photo_discr_predictions.keys(),
                                                                self.input_photo_discr_predictions.values())}
            self.output_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                            for key, pred in zip(self.output_photo_discr_predictions.keys(),
                                                                 self.output_photo_discr_predictions.values())}

model.py, in train()

replace this

            if discr_success >= win_rate:
                # Train generator
                _, summary_all, gener_acc_ = self.sess.run(
                    [self.g_optim_step, self.summary_merged_all, self.gener_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * (1. - gener_acc_)
            else:
                # Train discriminator.
                _, summary_all, discr_acc_ = self.sess.run(
                    [self.d_optim_step, self.summary_merged_all, self.discr_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * discr_acc_

by this

            # Train generator
            _, summary_all = self.sess.run(
                [self.g_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })
            # Train discriminator.
            _, summary_all = self.sess.run(
                [self.d_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })

if you use last 'fix' you can also comment out everything related to accuracy measurement/reporting. alas, i cannot make a test run with it, cause i don't have that huge places dataset (i use another smaller one). let me know how it goes on your side

narayansundararajan123 commented 5 years ago

Thanks so much! Will try and let you know.

narayansundararajan123 commented 5 years ago

@narayansundararajan123 ok, let's try these quick updates for original code:

module.py, in decoder()

        def residule_block(x, dim, ks=3, s=1, dropout=False, name='res'):
            p = int((ks - 1) / 2)
            y = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c1'), name+'_bn1')
            y = tf.pad(tf.nn.relu(y), [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c2'), name+'_bn2')
            if dropout is True and options.is_training is True: 
                y = tf.nn.dropout(y, 0.5)
            return y + x

        # Now stack 9 residual blocks
        num_kernels = features.get_shape().as_list()[-1]
        r1 = residule_block(features, num_kernels, name='g_r1')
        r2 = residule_block(r1, num_kernels, dropout=True, name='g_r2')
        r3 = residule_block(r2, num_kernels, dropout=True, name='g_r3')
        r4 = residule_block(r3, num_kernels, dropout=True, name='g_r4')
        r5 = residule_block(r4, num_kernels, dropout=True, name='g_r5')
        r6 = residule_block(r5, num_kernels, dropout=True, name='g_r6')
        r7 = residule_block(r6, num_kernels, dropout=True, name='g_r7')
        r8 = residule_block(r7, num_kernels, dropout=True, name='g_r8')
        r9 = residule_block(r8, num_kernels, name='g_r9')

model.py, in _build_model()

            def ones(x, key):
                return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            def zeros(x, key):
                return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)

            self.input_painting_discr_loss = {key: self.loss(pred, ones(pred, key)) * scale_weight[key]
                                              for key, pred in zip(self.input_painting_discr_predictions.keys(),
                                                                   self.input_painting_discr_predictions.values())}
            self.input_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                           for key, pred in zip(self.input_photo_discr_predictions.keys(),
                                                                self.input_photo_discr_predictions.values())}
            self.output_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                            for key, pred in zip(self.output_photo_discr_predictions.keys(),
                                                                 self.output_photo_discr_predictions.values())}

model.py, in train()

replace this

            if discr_success >= win_rate:
                # Train generator
                _, summary_all, gener_acc_ = self.sess.run(
                    [self.g_optim_step, self.summary_merged_all, self.gener_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * (1. - gener_acc_)
            else:
                # Train discriminator.
                _, summary_all, discr_acc_ = self.sess.run(
                    [self.d_optim_step, self.summary_merged_all, self.discr_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * discr_acc_

by this

            # Train generator
            _, summary_all = self.sess.run(
                [self.g_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })
            # Train discriminator.
            _, summary_all = self.sess.run(
                [self.d_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })

if you use last 'fix' you can also comment out everything related to accuracy measurement/reporting. alas, i cannot make a test run with it, cause i don't have that huge places dataset (i use another smaller one). let me know how it goes on your side

Hi

Tried training again using the art style dataset of 144 black and white paintings of different sizes. Used --image_size=256. Ran the training to 30000 iterations again with the new modifications. Still unfortunately running into the same issue of the output being entirely black stylized images. Would there be anything that I might have missed implementing other than the modifications from above or other suggestions on solving this issue?

eps696 commented 5 years ago

@narayansundararajan123 other changes were quite subtle (like tweaking loss weights for D and G separately), so i don't think they really matter. i also changed some technical ops (like loading data) for the ones i'm used to, but this was done for easier reading/maintaining, i doubt it could affect the result. could you share your dataset so that i'd try it on my side (if it's not private of course)?

narayansundararajan123 commented 5 years ago

Thanks. I also noticed beyond 210000 iterations when the model likely goes off, I am also getting

RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims, initial)

when i run the inference and get black output images after stylization.

I can also share the dataset if you could send me an email at narayan.sundararajan@gmail.com.

eps696 commented 5 years ago

haven't seen such warnings.. in fact, my fixes are not 100% remedy - i was also facing black output on some datasets, but it happened much later than with original code (like, ~200k vs 10~20k). these tricks are just stabilizing training for longer time - whether the model converges within that period is a separate question in every case

andrew194 commented 5 years ago

@eps696 Can you tell me what discriminator,transformer loss and feature loss weight you used because I cant get the GAN to converge.

eps696 commented 5 years ago

@andrew194 s_d_weight = {"s0": 1., "s1": 1., "s3": 0.5, "s5": 0.5, "s6": 0.5} s_g_weight = {"s0": 1., "s1": 0.7, "s3": 0.3, "s5": 0.3, "s6": 0.3} kept feature loss as in the original code (l1_loss * 100)

andrew194 commented 5 years ago

@eps696 Thanks! Did you also use 1 for the discriminator loss weight?

eps696 commented 5 years ago

didn't quite catch what you mean by 'use 1' s_d_weight are discriminator loss weights s_g_weight are generator loss weights

andrew194 commented 5 years ago

Sorry I was referring to the optimizer self.d_optim_step = tf.train.AdamOptimizer(self.lr).minimize(loss=self.options.discr_loss_weight * self.discr_loss,var_list=[self.discr_vars])

eps696 commented 5 years ago

multiple weights are applied to the losses before, no need for another multiplier. here is my code (var names are different, but should be pretty obvious):

        # Discriminator losses - ones for original styles, otherwise zero
        in_s_D_loss  = {key: loss(pred,  ones(pred, key)) * s_d_weight[key] for key, pred in zip(in_s_D_pred.keys(),  in_s_D_pred.values())}
        in_c_D_loss  = {key: loss(pred, zeros(pred, key)) * s_d_weight[key] for key, pred in zip(in_c_D_pred.keys(),  in_c_D_pred.values())}
        out_c_D_loss = {key: loss(pred, zeros(pred, key)) * s_d_weight[key] for key, pred in zip(out_c_D_pred.keys(), out_c_D_pred.values())}

        D_loss = tf.add_n(list(in_s_D_loss.values())) + \
                 tf.add_n(list(in_c_D_loss.values())) + \
                 tf.add_n(list(out_c_D_loss.values()))

        # Generator loss - ones for output images
        out_c_G_loss = {key: loss(pred, tf.ones_like(pred)) * s_g_weight[key] for key, pred in zip(out_c_D_pred.keys(), out_c_D_pred.values())}
        G_loss = tf.add_n(list(out_c_G_loss.values()))

        # Image loss.
        img_loss = mse_loss(t_block(out_c, 10), t_block(in_c, 10))

        # Features loss.
        feat_loss = l1_loss(out_c_feat, in_c_feat) 

        t_vars = tf.trainable_variables()
        D_vars = [var for var in t_vars if 'discriminator' in var.name]
        G_vars = [var for var in t_vars if 'encoder' in var.name or 'decoder' in var.name]

        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

        with tf.control_dependencies(update_ops):
            D_opt_step = tf.train.AdamOptimizer(a.lr).minimize(D_loss, var_list = [D_vars])
            G_opt_step = tf.train.AdamOptimizer(a.lr).minimize(G_loss + img_loss*100 + feat_loss*100, var_list=[G_vars])

CompVis / adaptive-style-transfer

It seems training will collapse when applying another Dataset #19