Open ghost opened 4 years ago
@william-nz No, the only thing I've done... I downloaded the file and ran it with the above (512) modifications. I saw a 512x512 video has been generated. That's all. I'm not very convinced with my result. Therefore I posted it here and hope that people can judge it themselves. Probably, I still have done something wrong 😆 When I get the time I'll do more experiments with that.
@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.
Generally the result looks like this:
It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...
Many thanks for your excellent work.
Hi @LopsidedJoaw, The video you showed looks very high resolution, which super resolution method did you use to get the result? Thanks!
Hi @AliaksandrSiarohin , I am training a model on 512x512 dataset from scratch. After 15 epochs, the loss is decreasing, but the keypoints seem to have just been confined to a small region.
Any Idea why this is happening? I haven't made any changes in the code. I have only modified config file.
dataset_params:
root_dir: 512_dataset/
frame_shape: [512, 512, 3]
id_sampling: False
pairs_list: data/vox256.csv
augmentation_params:
flip_param:
horizontal_flip: True
time_flip: True
jitter_param:
brightness: 0.1
contrast: 0.1
saturation: 0.1
hue: 0.1
model_params:
common_params:
num_kp: 10
num_channels: 3
estimate_jacobian: True
kp_detector_params:
temperature: 0.1
block_expansion: 32
max_features: 1024
scale_factor: 0.25
num_blocks: 5
generator_params:
block_expansion: 64
max_features: 512
num_down_blocks: 2
num_bottleneck_blocks: 6
estimate_occlusion_map: True
dense_motion_params:
block_expansion: 64
max_features: 1024
num_blocks: 5
scale_factor: 0.25
discriminator_params:
scales: [1]
block_expansion: 32
max_features: 512
num_blocks: 4
sn: True
train_params:
num_epochs: 100
num_repeats: 75
epoch_milestones: [60,90]
lr_generator: 2.0e-4
lr_discriminator: 2.0e-4
lr_kp_detector: 2.0e-4
batch_size: 4
scales: [1, 0.5, 0.25, 0.125]
checkpoint_freq: 5
transform_params:
sigma_affine: 0.05
sigma_tps: 0.005
points_tps: 5
loss_weights:
generator_gan: 0
discriminator_gan: 1
feature_matching: [10, 10, 10, 10]
perceptual: [10, 10, 10, 10, 10]
equivariance_value: 10
equivariance_jacobian: 10
reconstruction_params:
num_videos: 1000
format: '.mp4'
animate_params:
num_pairs: 50
format: '.mp4'
normalization_params:
adapt_movement_scale: True
use_relative_movement: True
use_relative_jacobian: True
visualizer_params:
kp_size: 5
draw_border: True
colormap: 'gist_rainbow'
Here is the loss till 15 epochs:
00000000) perceptual - 121.42917; equivariance_value - 0.71458; equivariance_jacobian - 0.75562
00000001) perceptual - 109.27000; equivariance_value - 0.35340; equivariance_jacobian - 0.65690
00000002) perceptual - 100.28600; equivariance_value - 0.16266; equivariance_jacobian - 0.56337
00000003) perceptual - 96.12051; equivariance_value - 0.14541; equivariance_jacobian - 0.51318
00000004) perceptual - 93.17576; equivariance_value - 0.14200; equivariance_jacobian - 0.48087
00000005) perceptual - 90.71331; equivariance_value - 0.15415; equivariance_jacobian - 0.47770
00000006) perceptual - 88.90341; equivariance_value - 0.22227; equivariance_jacobian - 0.49095
00000007) perceptual - 86.39249; equivariance_value - 0.21560; equivariance_jacobian - 0.47799
00000008) perceptual - 84.61519; equivariance_value - 0.20801; equivariance_jacobian - 0.46283
00000009) perceptual - 84.08470; equivariance_value - 0.21185; equivariance_jacobian - 0.46702
00000010) perceptual - 82.73890; equivariance_value - 0.20613; equivariance_jacobian - 0.45508
00000011) perceptual - 81.45905; equivariance_value - 0.19839; equivariance_jacobian - 0.44276
00000012) perceptual - 81.00780; equivariance_value - 0.20207; equivariance_jacobian - 0.44244
00000013) perceptual - 80.08536; equivariance_value - 0.19849; equivariance_jacobian - 0.43349
00000014) perceptual - 79.34811; equivariance_value - 0.19838; equivariance_jacobian - 0.42291
00000015) perceptual - 78.98586; equivariance_value - 0.19916; equivariance_jacobian - 0.41774
00000016) perceptual - 78.48245; equivariance_value - 0.19998; equivariance_jacobian - 0.41450
@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.
@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.
Nope
@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.
Another user seems to have fixed it for themselves, I have not tried myself yet so have not run into the error. https://github.com/adeptflax/motion-models/issues/2#issuecomment-1079919603
With 512x512 model shared above, code runs smoothly with suggested changes. First of all, thank you for sharing it. However, results that I get are not different than upscaled 256x256 results. Animation is not bad, but output video is blurry as if I upscaled 256x256 output to 512x512. Is this expected?
Is there a way to support high resolution