Model successfully adapted for 128x128px faces, help needed with "random_transform" in training_data.py

subzerofun commented 6 years ago

I have read the rules and know this is not about a bug, i just need some help with a particular function and thought the exposure of faceswap/issues could quickly solve this problem. Since a lot of users have asked for this feature, maybe this information helps others. Then why the hell did i put it in the wrong place? Unfortunately not a lot of people look at the model and playground repo... If the authors find that this issue is completely misplaced here, i will move it to deepfakes/faceswap-model.

Model (`Model_Original.py`) adapted for 128x128px input/output

For a better overview i've plotted all the layers of the models with the Keras function plot_model and then changed the relevant nodes. https://imgur.com/a/V0sBL

IMAGE_SHAPE = (128, 128, 3)
ENCODER_DIM = 1024

class Model(ModelAE):
    def initModel(self):
        (...)

    def Encoder(self):
        input_ = Input(shape=IMAGE_SHAPE)
        x = input_
        x = self.conv(128)(x)   # 128, 128,   3 -> 64, 64,  128
        x = self.conv(256)(x)   #  64,  64, 128 -> 32, 32,  256
        x = self.conv(512)(x)   #  32,  32, 256 -> 16, 16,  512
        x = self.conv(1024)(x)  #  16,  16, 512 ->  8,  8, 1024
        x = Dense(ENCODER_DIM)(Flatten()(x))    #  4  4 1024 -> 16384 -> 2048  |  8,  8, 1024 ->  65536 -> 4096       
        x = Dense(8 * 8 * 1024)(x)     # 4096 -> 65536
        x = Reshape((8, 8, 1024))(x)   # 65536 -> 8 8 1024
        x = self.upscale(512)(x)       # 8 8 1024 -> 8 8 2048 -> 16, 16, 512

        return KerasModel(input_, x)

    def Decoder(self):
        input_ = Input(shape=(16, 16, 512))     # 16, 16, 512
        x = input_ 
        x = self.upscale(256)(x)                # 16, 16, 512 -> 16, 16, 1024 ->  32,  32, 256
        x = self.upscale(128)(x)                # 32, 32, 256 -> 32, 32,  512 ->  64,  64, 128 
        x = self.upscale(64)(x)                 # 64, 64, 128 -> 64, 64,  256 -> 128, 128,  64
        x = Conv2D(3, kernel_size=5, padding='same', activation='sigmoid')(x)  # 128, 128,  64 -> 128, 128, 3
        return KerasModel(input_, x)

I would need the help of someone who has a better understanding of the OpenCV & numpy functions used here: https://github.com/deepfakes/faceswap/blob/20753a64b76a156aea17724348269d60dd525f87/lib/training_data.py#L63-L84

For now i can train only train 128x128 images with this dirty and useless hack (doesn't add more resolution and information to the network, just interpolated pixels):

warped_image = cv2.resize(warped_image, (128, 128))
target_image = cv2.resize(target_image, (128, 128))

Can someone help me out with changing the cv2 image transform functions? I've tried going through the OpenCV documentation for each step of the code, but so far i'm not able to change the numpy arrays into the right form. I know that the original 256x256 input image gets trimmed down and then transformed, so for high res faces to work i would also need input images bigger than 256px (and new values for trimming and transformation).

The only thing left would be the Convert scripts, but they don't look like there is much editing needed.

Other relevant information

Operating system and version: macOS 10.12.6
Python version: 3.6.4 / Tensorflow 1.4
Faceswap version: 20753a64b76a156aea17724348269d60dd525f87
Faceswap method: GPU GTX 1080 Ti

shaoanlu commented 6 years ago

https://github.com/deepfakes/faceswap/pull/151

Clorr commented 6 years ago

@subzerofun no problem for this issue. As long as you try contributing, it is always welcome. Even better you could have pushed a pull request so that people can directly checkout your code and try to debug.

As Shaoanlu stated, you can have a look at the GAN 128 plugin, or directly at faceswap-GAN repo

subzerofun commented 6 years ago

@Clorr @shaoanlu Thanks, did take a look at your modified random_warp function from #151. Copied it over, added the scale and zoom options to the TrainingDataGenerator and now it works perfectly. But i guess my model is not a very efficient nor elegant solution, at one point the last Dense layer of the Encoder produces a tensor of size 65536, where in the 64x64 model it would be 16384. Simply doubling everything of course slows down the training. And would probably cause problems for cards with little VRAM.

I've copied the structure of your Encoder and Decoder models over and 100 iterations take half the time. But since the GAN models function differently i guess i'm just poking around in the dark 😀. Will let the training run for a few hundred additional iterations and look if the output gets sharper. If not, i will wait until your new plugin is finished and try to the understand the GAN structure better.

I would push a PR only if i'd feel confident enough that the code works optimal for others too. Since i lack the experience to write efficient code (i'm just experimenting and playing around) i shouldn't let my frankenstein-code out in the wild 😅.

@Clorr Do you get better results in a shorter timeframe with the GAN approach? And is it a lot of work to modify the Convert script(s) for the GAN 128 plugin – do you only need to edit the Convert_GAN.py? I guess the masking is a little more time consuming ...

Clorr commented 6 years ago

Convert_GAN is made for the particular output of the GAN. If your work is based on the AutoEncoder, no need to go for this converter, you will have to stick to the over Convert_ plugins and modify the sizes. Just modifying the line face = cv2.resize(face, (64, 64)) to face = cv2.resize(face, (128,128)) should make you started

deepfakes / faceswap