AliaksandrSiarohin / first-order-model

This repository contains the source code for the paper First Order Motion Model for Image Animation
https://aliaksandrsiarohin.github.io/first-order-model-website/
MIT License
14.43k stars 3.2k forks source link

What are the parameters to train 100k iterations as stated in the supplementary material? #137

Open kenmbkr opened 4 years ago

kenmbkr commented 4 years ago

I adapted the script from this issue and found 3179 unique video_id for training. If I use the training parameters from vox-adv-256.yaml, there will be 1788k iterations, which is almost 18 times the magnitude of that stated in the supplementary material. Could you kindly clarify the num_epochs and num_repeats to get 100k iterations?

Parameters from vox-adv-256.yaml: num_epochs: 150 num_repeats: 75

Parameters from supplementary material: batch_size: 20

Length of dataset = 3179 Length of repeated dataset = 3179 num_repeats = 3179 75 = 238425 Iterations per epoch = 238425 / batch_size = 238425 / 20 ~= 11921 Total iterations = 11921 num_epochs = 11921 150 = 1788150 ~= 1788k

import pandas as pd

df = pd.read_csv("vox-metadata.csv")

df['bbox1'], df['bbox2'], df['bbox3'], df['bbox4'] = df['bbox'].str.split('-').str
df[["bbox1", "bbox2", "bbox3", "bbox4"]] = \
    df[["bbox1", "bbox2", "bbox3", "bbox4"]].apply(pd.to_numeric)

df['w'] = df['bbox3'] - df['bbox1']
df['h'] = df['bbox4'] - df['bbox2']
df['len'] = df['end'] - df['start']

df_train_test = df[(df['w'] > 255) & (df['h'] > 255) & (df['len'] > 63)]
df_train = df_train_test[df_train_test['partition'] == 'train']
df_test = df_train_test[df_train_test['partition'] == 'test']

print(df_train.shape, df_test.shape)
print(len(df_train['video_id'].unique()), len(df_test['video_id'].unique()))
AliaksandrSiarohin commented 4 years ago

For vox celeb I used unique persons not unique video-id. Vox celeb naming starts with id000...., so unique persons will be used. I guess there is about 400 unique persons. Also I used less repeats 25 or 35, because I was short on time. Later I train for more iterations and release this checkpoints because they work better. Also vox-adv is with discriminator, while in paper I only report results without it, e.g vox-256.yaml.

kenmbkr commented 4 years ago

For vox celeb I used unique persons not unique video-id. Vox celeb naming starts with id000...., so unique persons will be used. I guess there is about 400 unique persons.

How to configure training with unique persons? If I turn on id_sampling, this line gives unique video-id (~3k videos), not person id. If I turn off id_sampling, this line gives all videos (~19k videos) in the train folder.

EDIT: I used https://github.com/AliaksandrSiarohin/video-preprocessing to produce the dataset and the video naming is in this format: 0bA1AJCGEOo#003431#003598.mp4.

AliaksandrSiarohin commented 4 years ago

Thank you for noticing. This is a bug. I add the person_id field in metatadata and change load_video.py script, tell me if it work for you now.

kenmbkr commented 4 years ago

Thank you for the clarifications. Under the new naming scheme, this line gives 420 unique video-id. If I use 50 for num_repeats, I will be getting 100k iterations. The detailed math is as follows for reference:

person_id: 420 num_repeats: 50 batch_size: 20 num_epochs: 100

Length of repeated dataset = person_id num_repeats = 420 50 = 21,000 Iterations per epoch = 21,000 / batch_size = 1,050 Total iterations = 1,050 * num_epochs = 105,000 ~= 100k

MentalGear commented 4 years ago

Hi @kenmbkr What were your experiences with training your own model? Did you try to increase a higher quality dataset (>256px) output ?

kenmbkr commented 4 years ago

@MentalGear Keypoint detector is the hardest to train because the model may sometimes do shortcut learning by aligning the detected keypoints in a straight line to satisfy the equivariance constraint like this example. I trained on only 256, didn't try on high quality ones.

@AliaksandrSiarohin I believe the Taichi dataset has the same problem and need the person_id in the metadata. If we use 3049 video chunks for training as in the paper, the math does not add up to 100k iterations. Could you kindly update the metadata for Taichi as well?

AliaksandrSiarohin commented 4 years ago

Taichi is ok, because provided checkpoint have more repeats that what is reported i the paper.