AliaksandrSiarohin / motion-cosegmentation

Reference code for "Motion-supervised Co-Part Segmentation" paper
Other
653 stars 142 forks source link

Training of a 512x512 model did not go well. #44

Closed adeptflax closed 3 years ago

adeptflax commented 3 years ago

Same model I was trying to train on issue https://github.com/AliaksandrSiarohin/motion-cosegmentation/issues/43. The face swap model doesn't output very good results. How do I improve the model results? Face swap examples on epoch 5 https://imgur.com/a/lMhvZOf and on epoch 10 https://imgur.com/a/yCNyjx9 on images.

Segmentation module example output on epoch 5: image

Segmentation module example output on epoch 10: image

I replaced this line: https://github.com/AliaksandrSiarohin/motion-cosegmentation/blob/571e26f04b8c40c5454a158b4b570e4ba034c856/part_swap.py#L88 With this:

bm = F.interpolate(blend_mask, size=128)
out = enc_target * (1 - bm) + enc_source * bm

I did that in order to avoid tensors errors about tensors being different size.

Logs messages when I reran the code to get the warning messages at beginning. I didn't keep the logs when I trained the model.

train.py:85: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Use predefined train-test split.
Training...
Segmentation part initialized at random.
  0%|                                                    | 0/20 [00:00<?, ?it/s]/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3454: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3828: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1709: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

Code I used to get the face swap output:

#!/usr/bin/env python
# coding: utf-8

# In[2]:

import imageio, os,random
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import resize
from tqdm.notebook import tqdm
from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt

get_ipython().run_line_magic('matplotlib', 'inline')

# In[3]:

from part_swap import load_checkpoints
cpu = True
reconstruction_module, segmentation_module = load_checkpoints(config='config/vox-512-sem-10segments.yaml', 
                                               checkpoint='log/vox-512-sem-10segments 26-04-21 19:25:33/00000005-checkpoint.pth.tar',
                                               blend_scale=0.125, first_order_motion_model=True,cpu=cpu)

# In[4]:

from part_swap import make_video, load_face_parser
face_parser = load_face_parser(cpu=cpu)

# In[5]:

def swap(source_image, target_image):
    shape = source_image.shape
    #Resize image and video to 256x256

    source_image = resize(source_image, (512, 512))[..., :3]
    target_video = [resize(target_image, (512, 512))[..., :3]]

    out = make_video(swap_index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], source_image = source_image,
         target_video = target_video, use_source_segmentation=True, segmentation_module=segmentation_module,
         reconstruction_module=reconstruction_module, face_parser=face_parser, cpu=cpu)[0]

    return resize(out, (shape[0], shape[1]))

# In[6]:

def get_concat_h(im1, im2):
    dst = Image.new('RGB', (im1.width + im2.width, im1.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (im1.width, 0))
    return dst

# In[8]:

get_ipython().run_line_magic('matplotlib', 'inline')

dir_ = str(Path.home()) + '/gdrive/images1024x1024/57000/'
ims = os.listdir(dir_)

g = 5
s= random.sample(ims, g * 2)
srcs = s[:g]
dess = s[g:]
len(dess)

for i in range(g):
    target_image = imageio.imread(dir_ + dess[i])
    source_image = imageio.imread(dir_ + srcs[i])

    out = swap(source_image, target_image)

    dis = get_concat_h(get_concat_h(Image.fromarray(np.uint8(target_image)), Image.fromarray(np.uint8(source_image))),Image.fromarray(np.uint8(out * 255)))
    dis.save('dis/' + str(i) + '.jpg')
    plt.figure()
    plt.imshow(dis)

That's all the information I could think of sending.

AliaksandrSiarohin commented 3 years ago

Have you tried to do a face swap in supervised mode?

adeptflax commented 3 years ago

How would I do that?

AliaksandrSiarohin commented 3 years ago

For the reference we also provide fully-supervised segmentation. For fully-supervised add --supervised option. And run git clone https://github.com/AliaksandrSiarohin/face-makeup.PyTorch face_parsing which is a fork of @zllrunning.

adeptflax commented 3 years ago

That's what I am doing in the code I posted. I'm using that face parser in the code.

AliaksandrSiarohin commented 3 years ago

Motion segmentation network is not needed than. You can use fomm.

adeptflax commented 3 years ago

what's fomm?

AliaksandrSiarohin commented 3 years ago

First order motion model.

adeptflax commented 3 years ago

This is the code for running and loading the model. I have first_order_motion_model=True on load_checkpoints().

from part_swap import load_checkpoints
cpu = True
reconstruction_module, segmentation_module = load_checkpoints(config='config/vox-512-sem-10segments.yaml', 
                                               checkpoint='log/vox-512-sem-10segments 26-04-21 19:25:33/00000005-checkpoint.pth.tar',
                                               blend_scale=0.125, first_order_motion_model=True,cpu=cpu)

from part_swap import make_video, load_face_parser
face_parser = load_face_parser(cpu=cpu)

def swap(source_image, target_image):
    shape = source_image.shape

    source_image = resize(source_image, (512, 512))[..., :3]
    target_video = [resize(target_image, (512, 512))[..., :3]]

    out = make_video(swap_index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], source_image = source_image,
         target_video = target_video, use_source_segmentation=True, segmentation_module=segmentation_module,
         reconstruction_module=reconstruction_module, face_parser=face_parser, cpu=cpu)[0]

    return resize(out, (shape[0], shape[1]))
AliaksandrSiarohin commented 3 years ago

OK, but checkpoint should also be one from first order.

adeptflax commented 3 years ago

oh, I didn't realized you could use the first order model directly to do it. I thought had to train a face swap on top of it.

adeptflax commented 3 years ago

I dunno I'll use that.