Training of a 512x512 model did not go well. #44

Closed adeptflax closed 3 years ago

adeptflax commented 3 years ago

Same model I was trying to train on issue The face swap model doesn't output very good results. How do I improve the model results? Face swap examples on epoch 5 and on epoch 10 on images.

Segmentation module example output on epoch 5: image

Segmentation module example output on epoch 10: image

I replaced this line: With this:

bm = F.interpolate(blend_mask, size=128)
out = enc_target * (1 - bm) + enc_source * bm

I did that in order to avoid tensors errors about tensors being different size.

Logs messages when I reran the code to get the warning messages at beginning. I didn't keep the logs when I trained the model. YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read for full details.
  config = yaml.load(f)
Use predefined train-test split.
Segmentation part initialized at random.
  0%|                                                    | 0/20 [00:00<?, ?it/s]/opt/conda/lib/python3.8/site-packages/torch/nn/ UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
/opt/conda/lib/python3.8/site-packages/torch/nn/ UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
/opt/conda/lib/python3.8/site-packages/torch/nn/ UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

Code I used to get the face swap output:

#!/usr/bin/env python
# coding: utf-8

# In[2]:

import imageio, os,random
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import resize
from tqdm.notebook import tqdm
from PIL import Image
from pathlib import Path
# In[3]:

from part_swap import load_checkpoints
cpu = True
reconstruction_module, segmentation_module = load_checkpoints(config='config/vox-512-sem-10segments.yaml', 
                                               checkpoint='log/vox-512-sem-10segments 26-04-21 19:25:33/00000005-checkpoint.pth.tar',
                                               blend_scale=0.125, first_order_motion_model=True,cpu=cpu)

from part_swap import make_video, load_face_parser
face_parser = load_face_parser(cpu=cpu)

def swap(source_image, target_image):
    shape = source_image.shape
    #Resize image and video to 256x256

    source_image = resize(source_image, (512, 512))[..., :3]
    target_video = [resize(target_image, (512, 512))[..., :3]]

    out = make_video(swap_index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], source_image = source_image,
         target_video = target_video, use_source_segmentation=True, segmentation_module=segmentation_module,
         reconstruction_module=reconstruction_module, face_parser=face_parser, cpu=cpu)[0]

    return resize(out, (shape[0], shape[1]))

def get_concat_h(im1, im2):
    dst ='RGB', (im1.width + im2.width, im1.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (im1.width, 0))
    return dst

dir_ = str(Path.home()) + '/gdrive/images1024x1024/57000/'
ims = os.listdir(dir_)

g = 5
s= random.sample(ims, g * 2)
srcs = s[:g]
dess = s[g:]

for i in range(g):
    target_image = imageio.imread(dir_ + dess[i])
    source_image = imageio.imread(dir_ + srcs[i])

    out = swap(source_image, target_image)

    dis = get_concat_h(get_concat_h(Image.fromarray(np.uint8(target_image)), Image.fromarray(np.uint8(source_image))),Image.fromarray(np.uint8(out * 255)))'dis/' + str(i) + '.jpg')

That's all the information I could think of sending.

AliaksandrSiarohin commented 3 years ago

Have you tried to do a face swap in supervised mode?

adeptflax commented 3 years ago

How would I do that?

AliaksandrSiarohin commented 3 years ago

For the reference we also provide fully-supervised segmentation. For fully-supervised add --supervised option. And run git clone face_parsing which is a fork of @zllrunning.

adeptflax commented 3 years ago

That's what I am doing in the code I posted. I'm using that face parser in the code.

AliaksandrSiarohin commented 3 years ago

Motion segmentation network is not needed than. You can use fomm.

adeptflax commented 3 years ago

what's fomm?

AliaksandrSiarohin commented 3 years ago

First order motion model.

adeptflax commented 3 years ago

This is the code for running and loading the model. I have first_order_motion_model=True on load_checkpoints().

from part_swap import load_checkpoints
cpu = True
reconstruction_module, segmentation_module = load_checkpoints(config='config/vox-512-sem-10segments.yaml', 
                                               checkpoint='log/vox-512-sem-10segments 26-04-21 19:25:33/00000005-checkpoint.pth.tar',
                                               blend_scale=0.125, first_order_motion_model=True,cpu=cpu)

from part_swap import make_video, load_face_parser
face_parser = load_face_parser(cpu=cpu)

def swap(source_image, target_image):
    shape = source_image.shape

    source_image = resize(source_image, (512, 512))[..., :3]
    target_video = [resize(target_image, (512, 512))[..., :3]]

    out = make_video(swap_index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], source_image = source_image,
         target_video = target_video, use_source_segmentation=True, segmentation_module=segmentation_module,
         reconstruction_module=reconstruction_module, face_parser=face_parser, cpu=cpu)[0]

    return resize(out, (shape[0], shape[1]))
AliaksandrSiarohin commented 3 years ago

OK, but checkpoint should also be one from first order.

adeptflax commented 3 years ago

oh, I didn't realized you could use the first order model directly to do it. I thought had to train a face swap on top of it.

adeptflax commented 3 years ago

I dunno I'll use that.