deepfakes / faceswap-playground

User dedicated repo for the faceswap project
304 stars 194 forks source link

How about train only A or B depending on loss compare ? #88

Closed iperov closed 6 years ago

iperov commented 6 years ago
    self.loss_A = 9999.0
    self.loss_B = 9999.0

    def train_one_step(self, iter, viewer):

        if self.loss_A >= self.loss_B:
            epoch, warped_A, target_A = next(self.images_A)
            self.loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A)
        else:
            epoch, warped_B, target_B = next(self.images_B)
            self.loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)

        print("[{0}] [#{1:05d}] loss_A: {2:.5f}, loss_B: {3:.5f}".format(time.strftime("%H:%M:%S"), iter, self.loss_A, self.loss_B),
            end='\r')

        if viewer is not None:
            epoch, warped_A, target_A = next(self.images_A)
            epoch, warped_B, target_B = next(self.images_B)
            viewer(self.show_sample(target_A[0:14], target_B[0:14]), "training")

2018-03-06_22-18-09

I think its better, because destination video has much less frames than source celeb, so training both we waste time, because less trained celeb results blurry face

deepfakesclub commented 6 years ago

Possibly an easy win. Good idea.

I wonder if the quality of A and B are equally important when going from A->B. Is A_loss = 0.02, B_loss = 0.01 identical in quality (however you measure) to A_loss = 0.01, B_loss = 0.02?

I feel like B_loss is more important for A->B, but I have no evidence to support this guess.

iperov commented 6 years ago

yeah I think so too, B_loss more important, because its affects sharpness of replaced face in result, whereas A_loss affects only correct map of face features.

modelsex commented 6 years ago

Come on, this is GAN.

deepfakesclub commented 6 years ago

Well, some idea of additional feedback on the network training.

So there could be some optimal weight, like self.loss_A >= K *(self.loss_B), where K is a constant or maybe even a function of loss_B.

Without any further data, I would take a wild guess and set K=1.1 or 1.2 to ensure more time is spent on loss_B.

But...

The only other caveat is if the training data sets have different quality. Maybe loss_B = 0.02 is the best you can get with a poor face B training set, while face A has great training data and can reach 0.01. For example, you have 10000 HD video frames of face A, and just 100 blurry selfies of face B. You wouldn't want to waste time overtraining B in that case. You might as well just max out face A training to get what you can. However, if the loss_B = 0.02 limit means the results look terrible even with loss_A = 0.001, there's no point bothering with any further training. So yeah, in that case, this is a good idea.

iperov commented 6 years ago

I got network overfit with code from main post.

This fix work well:

        self.loss_A = 9999.0
        self.loss_B = 9999.0

    def train_one_step(self, iter, viewer):

        if iter % 10 == 0:
            epoch, warped_A, target_A = next(self.images_A)
            epoch, warped_B, target_B = next(self.images_B)
            self.loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A)
            self.loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)
        else:
            if self.loss_A >= self.loss_B:
                epoch, warped_A, target_A = next(self.images_A)
                self.loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A)
            else:
                epoch, warped_B, target_B = next(self.images_B)
                self.loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)       

        print("[{0}] [#{1:05d}] loss_A: {2:.5f}, loss_B: {3:.5f}".format(time.strftime("%H:%M:%S"), iter, self.loss_A, self.loss_B),
            end='\r')

        if viewer is not None:
            epoch, warped_A, target_A = next(self.images_A)
            epoch, warped_B, target_B = next(self.images_B)
            viewer(self.show_sample(target_A[0:14], target_B[0:14]), "training")

spent night and B now closer to A 2018-03-07_09-42-19

modelsex commented 6 years ago

@iperov Is there necessary balance loss_B to loss_A, would you please explain more? Is there any example show overfit? How about training time? Thanks.

iperov commented 6 years ago

@modelsex

for example A has 600 similar photos from video source B has 1500 various photos from internet source. Train A->A much faster than B. You can see that A->A became sharper than B->B in preview. So why spend time to train A->A that has enough sharpness?, especially when our goal B->A convert.

Overfit - all predicts become black with red noise.

Clorr commented 6 years ago

I usually stop training decoder_A and I even set encoder.trainable = False when going below 0.03 . But I'm not trying to achieve high quality faceswaps so this may not be a good advice for video swapps, but i agree this saves time...

3xtr3m3d commented 6 years ago

can this be added as a option?

ghost commented 6 years ago

Can both be added please, iperov's training strategy and Clorr's freezing of chosen encoder. I tried to do it manually but I couldn't get either to work.

iperov commented 6 years ago

PR https://github.com/deepfakes/faceswap/pull/246

ghost commented 6 years ago

@iperov :)

NagashSzarekh commented 6 years ago

This could also be useful if you want to change out one of the data sets, so for example if you want to change dataset_A to use a different set of pictures from a different video but keep dataset_B the same you would want to concentrate on training the A side more then the B if you are reusing the same model.