Open kenmbkr opened 4 years ago
For vox celeb I used unique persons not unique video-id. Vox celeb naming starts with id000...., so unique persons will be used. I guess there is about 400 unique persons. Also I used less repeats 25 or 35, because I was short on time. Later I train for more iterations and release this checkpoints because they work better. Also vox-adv is with discriminator, while in paper I only report results without it, e.g vox-256.yaml.
For vox celeb I used unique persons not unique video-id. Vox celeb naming starts with id000...., so unique persons will be used. I guess there is about 400 unique persons.
How to configure training with unique persons? If I turn on id_sampling
, this line gives unique video-id
(~3k videos), not person id. If I turn off id_sampling
, this line gives all videos (~19k videos) in the train
folder.
EDIT: I used https://github.com/AliaksandrSiarohin/video-preprocessing to produce the dataset and the video naming is in this format: 0bA1AJCGEOo#003431#003598.mp4
.
Thank you for noticing. This is a bug. I add the person_id field in metatadata and change load_video.py script, tell me if it work for you now.
Thank you for the clarifications. Under the new naming scheme, this line gives 420 unique video-id
. If I use 50 for num_repeats
, I will be getting 100k iterations. The detailed math is as follows for reference:
person_id
: 420
num_repeats
: 50
batch_size
: 20
num_epochs
: 100
Length of repeated dataset = person_id
num_repeats
= 420 50 = 21,000
Iterations per epoch = 21,000 / batch_size
= 1,050
Total iterations = 1,050 * num_epochs
= 105,000 ~= 100k
Hi @kenmbkr What were your experiences with training your own model? Did you try to increase a higher quality dataset (>256px) output ?
@MentalGear Keypoint detector is the hardest to train because the model may sometimes do shortcut learning by aligning the detected keypoints in a straight line to satisfy the equivariance constraint like this example. I trained on only 256, didn't try on high quality ones.
@AliaksandrSiarohin I believe the Taichi dataset has the same problem and need the person_id
in the metadata. If we use 3049 video chunks for training as in the paper, the math does not add up to 100k iterations. Could you kindly update the metadata for Taichi as well?
Taichi is ok, because provided checkpoint have more repeats that what is reported i the paper.
I adapted the script from this issue and found 3179 unique
video_id
for training. If I use the training parameters from vox-adv-256.yaml, there will be 1788k iterations, which is almost 18 times the magnitude of that stated in the supplementary material. Could you kindly clarify thenum_epochs
andnum_repeats
to get 100k iterations?Parameters from
vox-adv-256.yaml
:num_epochs
: 150num_repeats
: 75Parameters from supplementary material:
batch_size
: 20Length of dataset = 3179 Length of repeated dataset = 3179
num_repeats
= 3179 75 = 238425 Iterations per epoch = 238425 /batch_size
= 238425 / 20 ~= 11921 Total iterations = 11921num_epochs
= 11921 150 = 1788150 ~= 1788k