AliaksandrSiarohin / monkey-net

Animating Arbitrary Objects via Deep Motion Transfer
467 stars 81 forks source link

Strange results using pretrained model #5

Closed wanshun123 closed 5 years ago

wanshun123 commented 5 years ago

I am running motion transfer with the following command, using the pretrained checkpoint in the readme (keeping everything in moving-gif.yaml the same):

python demo.py --config config/moving-gif.yaml --driving_video driver.gif --source_image source.png --checkpoint moving-gif-ckp.pth.tar

The driving gif is as follows:

driver

Source image:

source

This results in the following:

demo

AliaksandrSiarohin commented 5 years ago

The checkpoint is for moving-gif dataset.

AliaksandrSiarohin commented 5 years ago

Try this one https://yadi.sk/d/EX7N9fuIuE4FNg, but not sure is the correct one.

wanshun123 commented 5 years ago

Doesn't appear to be the right model:

Traceback (most recent call last):
  File "demo.py", line 52, in <module>
    Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
  File "/home/paperspace/monkey/monkey/logger.py", line 54, in load_cpk
    generator.load_state_dict(checkpoint['generator'])
  File "/home/paperspace/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
        Missing key(s) in state_dict: "appearance_encoder.down_blocks.5.conv.weight", "appearance_encoder.down_blocks.5.conv.bias", "appearance_encoder.down_blocks.5.norm.weight", "appearance_encoder.down_blocks.5.norm.bias", "appearance_encoder.down_blocks.5.norm.running_mean", "appearance_encoder.down_blocks.5.norm.running_var", "video_decoder.up_blocks.5.conv.weight", "video_decoder.up_blocks.5.conv.bias", "video_decoder.up_blocks.5.norm.weight", "video_decoder.up_blocks.5.norm.bias", "video_decoder.up_blocks.5.norm.running_mean", "video_decoder.up_blocks.5.norm.running_var".
        size mismatch for appearance_encoder.down_blocks.4.conv.weight: copying a param of torch.Size([1024, 512, 1, 3, 3]) from checkpoint, where the shape is torch.Size([512, 512, 1, 3, 3]) in current model.
        size mismatch for appearance_encoder.down_blocks.4.conv.bias: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for appearance_encoder.down_blocks.4.norm.weight: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for appearance_encoder.down_blocks.4.norm.bias: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for appearance_encoder.down_blocks.4.norm.running_mean: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for appearance_encoder.down_blocks.4.norm.running_var: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for dense_motion_module.group_blocks.0.conv.weight: copying a param of torch.Size([66, 6, 1, 1, 1]) from checkpoint, where the shape is torch.Size([44, 4, 1, 1, 1]) in current model.
        size mismatch for dense_motion_module.group_blocks.0.conv.bias: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.0.norm.weight: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.0.norm.bias: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.0.norm.running_mean: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.0.norm.running_var: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.1.conv.weight: copying a param of torch.Size([66, 6, 1, 1, 1]) from checkpoint, where the shape is torch.Size([44, 4, 1, 1, 1]) in current model.
        size mismatch for dense_motion_module.group_blocks.1.conv.bias: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.1.norm.weight: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.1.norm.bias: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.1.norm.running_mean: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.group_blocks.1.norm.running_var: copying a param of torch.Size([66]) from checkpoint, where the shape is torch.Size([44]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.0.conv.weight: copying a param of torch.Size([64, 66, 1, 3, 3]) from checkpoint, where the shape is torch.Size([64, 44, 1, 3, 3]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.4.conv.weight: copying a param of torch.Size([1024, 512, 1, 3, 3]) from checkpoint, where the shape is torch.Size([512, 512, 1, 3, 3]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.4.conv.bias: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.4.norm.weight: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.4.norm.bias: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.4.norm.running_mean: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for dense_motion_module.hourglass.encoder.down_blocks.4.norm.running_var: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for dense_motion_module.hourglass.decoder.up_blocks.0.conv.weight: copying a param of torch.Size([512, 1024, 1, 3, 3]) from checkpoint, where the shape is torch.Size([512, 512, 1, 3, 3]) in current model.
        size mismatch for dense_motion_module.hourglass.decoder.conv.weight: copying a param of torch.Size([13, 98, 1, 3, 3]) from checkpoint, where the shape is torch.Size([13, 76, 1, 3, 3]) in current model.
        size mismatch for video_decoder.up_blocks.0.conv.weight: copying a param of torch.Size([1024, 1034, 1, 3, 3]) from checkpoint, where the shape is torch.Size([512, 522, 1, 3, 3]) in current model.
        size mismatch for video_decoder.up_blocks.0.conv.bias: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for video_decoder.up_blocks.0.norm.weight: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for video_decoder.up_blocks.0.norm.bias: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for video_decoder.up_blocks.0.norm.running_mean: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for video_decoder.up_blocks.0.norm.running_var: copying a param of torch.Size([1024]) from checkpoint, where the shape is torch.Size([512]) in current model.
        size mismatch for video_decoder.up_blocks.1.conv.weight: copying a param of torch.Size([512, 2058, 1, 3, 3]) from checkpoint, where the shape is torch.Size([256, 1034, 1, 3, 3]) in current model.
        size mismatch for video_decoder.up_blocks.1.conv.bias: copying a param of torch.Size([512]) from checkpoint, where the shape is torch.Size([256]) in current model.
        size mismatch for video_decoder.up_blocks.1.norm.weight: copying a param of torch.Size([512]) from checkpoint, where the shape is torch.Size([256]) in current model.
        size mismatch for video_decoder.up_blocks.1.norm.bias: copying a param of torch.Size([512]) from checkpoint, where the shape is torch.Size([256]) in current model.
        size mismatch for video_decoder.up_blocks.1.norm.running_mean: copying a param of torch.Size([512]) from checkpoint, where the shape is torch.Size([256]) in current model.
        size mismatch for video_decoder.up_blocks.1.norm.running_var: copying a param of torch.Size([512]) from checkpoint, where the shape is torch.Size([256]) in current model.
        size mismatch for video_decoder.up_blocks.2.conv.weight: copying a param of torch.Size([256, 1034, 1, 3, 3]) from checkpoint, where the shape is torch.Size([128, 522, 1, 3, 3]) in current model.
        size mismatch for video_decoder.up_blocks.2.conv.bias: copying a param of torch.Size([256]) from checkpoint, where the shape is torch.Size([128]) in current model.
        size mismatch for video_decoder.up_blocks.2.norm.weight: copying a param of torch.Size([256]) from checkpoint, where the shape is torch.Size([128]) in current model.
        size mismatch for video_decoder.up_blocks.2.norm.bias: copying a param of torch.Size([256]) from checkpoint, where the shape is torch.Size([128]) in current model.
        size mismatch for video_decoder.up_blocks.2.norm.running_mean: copying a param of torch.Size([256]) from checkpoint, where the shape is torch.Size([128]) in current model.
        size mismatch for video_decoder.up_blocks.2.norm.running_var: copying a param of torch.Size([256]) from checkpoint, where the shape is torch.Size([128]) in current model.
        size mismatch for video_decoder.up_blocks.3.conv.weight: copying a param of torch.Size([128, 522, 1, 3, 3]) from checkpoint, where the shape is torch.Size([64, 266, 1, 3, 3]) in current model.
        size mismatch for video_decoder.up_blocks.3.conv.bias: copying a param of torch.Size([128]) from checkpoint, where the shape is torch.Size([64]) in current model.
        size mismatch for video_decoder.up_blocks.3.norm.weight: copying a param of torch.Size([128]) from checkpoint, where the shape is torch.Size([64]) in current model.
        size mismatch for video_decoder.up_blocks.3.norm.bias: copying a param of torch.Size([128]) from checkpoint, where the shape is torch.Size([64]) in current model.
        size mismatch for video_decoder.up_blocks.3.norm.running_mean: copying a param of torch.Size([128]) from checkpoint, where the shape is torch.Size([64]) in current model.
        size mismatch for video_decoder.up_blocks.3.norm.running_var: copying a param of torch.Size([128]) from checkpoint, where the shape is torch.Size([64]) in current model.
        size mismatch for video_decoder.up_blocks.4.conv.weight: copying a param of torch.Size([64, 266, 1, 3, 3]) from checkpoint, where the shape is torch.Size([32, 138, 1, 3, 3]) in current model.
        size mismatch for video_decoder.up_blocks.4.conv.bias: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([32]) in current model.
        size mismatch for video_decoder.up_blocks.4.norm.weight: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([32]) in current model.
        size mismatch for video_decoder.up_blocks.4.norm.bias: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([32]) in current model.
        size mismatch for video_decoder.up_blocks.4.norm.running_mean: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([32]) in current model.
        size mismatch for video_decoder.up_blocks.4.norm.running_var: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([32]) in current model.

I may try training from scratch on nemo later.

AliaksandrSiarohin commented 5 years ago

OK. But here the reason probably because, you need to specify the correct config. --config config/nemo.yaml Also make sure that you images is of size 64x64. And specify this in script.

wanshun123 commented 5 years ago

My previous post was incorrect - the model posted here is the right one for nemo, I can run as follows (using 64x64 images):

python demo.py --config config/nemo.yaml --driving_video driver2.gif --source_image source2.png --checkpoint nemo-ckp.pth.tar --image_shape 64,64