deepfakes / faceswap

Deepfakes Software For All
https://www.faceswap.dev
GNU General Public License v3.0
52.22k stars 13.21k forks source link

[Experiment] Retraining decoder to 256x256 #247

Closed Clorr closed 6 years ago

Clorr commented 6 years ago

Hi guys,

I'm currently exploring some new ways of faceswapping. I wanted to share my last experiment on higher quality swaps which consists of modifying the decoder so that it outputs 256x256 images.

I assumed the following:

So, I added some layers to the decoder and decided to train the decoder B specifically, by getting rid of decoder A and by setting encoder.trainable to False.

So far I'm getting this (loss of 0.029): _sample_training

babilio commented 6 years ago

Interesting! I think your assumptions make sense and this looks very promising. How would I go about testing this. Can I use an old model and just train again with -t OriginalRetrain? Then when converting use -t OriginalRetrain or just -t Original?

Thanks!

Clorr commented 6 years ago

You can use an already trained model. The plugin will create a new decoder file that won't overwrite the existing one (=> the old decoder is not reloaded, at least for now, but the loss goes down much faster than for the initial training at least at the beginning).

For the converter, I did not manage yet to use the generated face at full resolution. I'll push an update when possible... (but yes, for the -t arg, you will have to use OriginalRetrain)

Clorr commented 6 years ago

Note that my goal was to have more details on output, but I don't think the simple decoder architecture will let me have what I want. However, we can still test this approach with much higher resolution (but then we will have to extract higher res faces)

Clorr commented 6 years ago

@babilio I pushed a convert, it is as experimental as the rest ;-)

babilio commented 6 years ago

I did a quick train and a convert and the convert turned even better than the training previews. It was a little blurry but I am working with a set of 250 images for B, so tentatively with bigger set or more training it would be better my loss was .02.

iperov commented 6 years ago

I also tried various decoders to reach more detailed face. No chance. I think its because NN learning to reach average of all warped crap we input to it.

look at gif showing warped input and output

even bent nose outputs straight nose.: 2018-03-08_17-55-44

on the left looking eyes outputs straight looking: 14

because all averaged. So why we expecting sharp details in averaged faces ?

Latest Daddario ^ I trained with this decoder:

    def Decoder(self):
        input_ = Input(shape=(8, 8, 512))
        x = input_
        x = self.upscale(512)(x)
        x = self.upscale(256)(x)
        x = self.upscale(128)(x)
        x = self.upscale(64)(x)
        x = self.upscale(32)(x)
        x = Conv2D(3, kernel_size=5, padding='same', activation='sigmoid')(x)
        x = NearestNeighborDownsampler()(x)
        x = BicubicDownsampler()(x)        
        return KerasModel(input_, x)

[Link Removed]

Clorr commented 6 years ago

As I said in #249 , this is not the way we will have more crisp faces because at some point the loss function is the problem.

However I was thinking higher res images would help on some small size features like eyes that sometimes are not really well rendered (I'm not talking about the sight direction but more on the fact that in some places eyes are just a horrible mix of pixels, and with higher res, you get something more meaningful). I wanted to see the results, now I see it's limits, but I thought it could help some others.

Also the goal of the experiment was to try to freeze the encoder so we can try training deeper decoders and this part seems to work.

On the next step, I'm trying to document myself on how to use the residual loss to train some side network (like the mask in GAN), because this project is for me a way to learn deep learning more than doing actual faceswaps

Clorr commented 6 years ago

I updated the code with DSSIM loss function and res_block. Not very big improvments, but still better than previous. This is the result: _sample_training

kvrooman commented 6 years ago

the issue of the "averaged face" pose/eye gaze direction/blur is due to the way the loss function is designed. differences in alignment ( double eyebrows, eye gaze, distorted or zoomed features, incorrect expression or pose) are lumped together with differences in texture ( color, lighting, gamma, white balance ) in the single loss function.

was reading a paper with some very interesting ideas on loss functions and feedback from seperate items... 1. landmark error, 2. texture error of the frontal flattened image, 3. percerputal error from the final warped image ( by similiarity functions or the return of a seperate CNN recognition model )

https://www.arxiv-vanity.com/papers/1701.04851/

Clorr commented 6 years ago

Thanks for the link, seems very interesting I'll try to read this ASAP. Note you can post this kind of info in faceswap-model repo, it is more dedicated to gather this kind of paper.

I totally agree that the loss is the problem now. The first layers are trained well, we just now have a problem on the very last layers that don't get the right info to adjust correctly. I'm currently trying simple solutions because I don't think we need very complicated solutions. In style transfer GAN, folks try to generate images for which they don't have a ground truth so creating a discriminator is a must. But here we have one, so the model should converge to this very ground truth...

gessyoo commented 6 years ago

Can somebody give me a quick explanation how to run the experimental retrain model or post the code? I tried inserting the experimental code in a cloned Original Model directory but got error messages.