AliaksandrSiarohin / first-order-model

This repository contains the source code for the paper First Order Motion Model for Image Animation
https://aliaksandrsiarohin.github.io/first-order-model-website/
MIT License
14.32k stars 3.19k forks source link

Does not support high-resolution images #20

Open ghost opened 4 years ago

ghost commented 4 years ago

Is there a way to support high resolution

bigboss97 commented 3 years ago

@william-nz No, the only thing I've done... I downloaded the file and ran it with the above (512) modifications. I saw a 512x512 video has been generated. That's all. I'm not very convinced with my result. Therefore I posted it here and hope that people can judge it themselves. Probably, I still have done something wrong 😆 When I get the time I'll do more experiments with that.

zhaoruiqiff commented 2 years ago

@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.

Generally the result looks like this:

ezgif-1-3f05db10770d

It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...

Many thanks for your excellent work.

Hi @LopsidedJoaw, The video you showed looks very high resolution, which super resolution method did you use to get the result? Thanks!

mdv3101 commented 2 years ago

Hi @AliaksandrSiarohin , I am training a model on 512x512 dataset from scratch. After 15 epochs, the loss is decreasing, but the keypoints seem to have just been confined to a small region.

image

Any Idea why this is happening? I haven't made any changes in the code. I have only modified config file.

dataset_params:
  root_dir: 512_dataset/
  frame_shape: [512, 512, 3]
  id_sampling: False
  pairs_list: data/vox256.csv
  augmentation_params:
    flip_param:
      horizontal_flip: True
      time_flip: True
    jitter_param:
      brightness: 0.1
      contrast: 0.1
      saturation: 0.1
      hue: 0.1

model_params:
  common_params:
    num_kp: 10
    num_channels: 3
    estimate_jacobian: True
  kp_detector_params:
     temperature: 0.1
     block_expansion: 32
     max_features: 1024
     scale_factor: 0.25
     num_blocks: 5
  generator_params:
    block_expansion: 64
    max_features: 512
    num_down_blocks: 2
    num_bottleneck_blocks: 6
    estimate_occlusion_map: True
    dense_motion_params:
      block_expansion: 64
      max_features: 1024
      num_blocks: 5
      scale_factor: 0.25
  discriminator_params:
    scales: [1]
    block_expansion: 32
    max_features: 512
    num_blocks: 4
    sn: True

train_params:
  num_epochs: 100
  num_repeats: 75
  epoch_milestones: [60,90]
  lr_generator: 2.0e-4
  lr_discriminator: 2.0e-4
  lr_kp_detector: 2.0e-4
  batch_size: 4
  scales: [1, 0.5, 0.25, 0.125]
  checkpoint_freq: 5
  transform_params:
    sigma_affine: 0.05
    sigma_tps: 0.005
    points_tps: 5
  loss_weights:
    generator_gan: 0
    discriminator_gan: 1
    feature_matching: [10, 10, 10, 10]
    perceptual: [10, 10, 10, 10, 10]
    equivariance_value: 10
    equivariance_jacobian: 10

reconstruction_params:
  num_videos: 1000
  format: '.mp4'

animate_params:
  num_pairs: 50
  format: '.mp4'
  normalization_params:
    adapt_movement_scale: True
    use_relative_movement: True
    use_relative_jacobian: True

visualizer_params:
  kp_size: 5
  draw_border: True
  colormap: 'gist_rainbow'

Here is the loss till 15 epochs:

00000000) perceptual - 121.42917; equivariance_value - 0.71458; equivariance_jacobian - 0.75562
00000001) perceptual - 109.27000; equivariance_value - 0.35340; equivariance_jacobian - 0.65690
00000002) perceptual - 100.28600; equivariance_value - 0.16266; equivariance_jacobian - 0.56337
00000003) perceptual - 96.12051; equivariance_value - 0.14541; equivariance_jacobian - 0.51318
00000004) perceptual - 93.17576; equivariance_value - 0.14200; equivariance_jacobian - 0.48087
00000005) perceptual - 90.71331; equivariance_value - 0.15415; equivariance_jacobian - 0.47770
00000006) perceptual - 88.90341; equivariance_value - 0.22227; equivariance_jacobian - 0.49095
00000007) perceptual - 86.39249; equivariance_value - 0.21560; equivariance_jacobian - 0.47799
00000008) perceptual - 84.61519; equivariance_value - 0.20801; equivariance_jacobian - 0.46283
00000009) perceptual - 84.08470; equivariance_value - 0.21185; equivariance_jacobian - 0.46702
00000010) perceptual - 82.73890; equivariance_value - 0.20613; equivariance_jacobian - 0.45508
00000011) perceptual - 81.45905; equivariance_value - 0.19839; equivariance_jacobian - 0.44276
00000012) perceptual - 81.00780; equivariance_value - 0.20207; equivariance_jacobian - 0.44244
00000013) perceptual - 80.08536; equivariance_value - 0.19849; equivariance_jacobian - 0.43349
00000014) perceptual - 79.34811; equivariance_value - 0.19838; equivariance_jacobian - 0.42291
00000015) perceptual - 78.98586; equivariance_value - 0.19916; equivariance_jacobian - 0.41774
00000016) perceptual - 78.48245; equivariance_value - 0.19998; equivariance_jacobian - 0.41450
sadluck commented 1 year ago

@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.

Animan8000 commented 1 year ago

@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.

Nope

Inferencer commented 1 year ago

@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.

Another user seems to have fixed it for themselves, I have not tried myself yet so have not run into the error. https://github.com/adeptflax/motion-models/issues/2#issuecomment-1079919603

ulucsahin commented 9 months ago

With 512x512 model shared above, code runs smoothly with suggested changes. First of all, thank you for sharing it. However, results that I get are not different than upscaled 256x256 results. Animation is not bad, but output video is blurry as if I upscaled 256x256 output to 512x512. Is this expected?