ai4r / Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Other
245 stars 35 forks source link

result on test set #12

Closed catherine-qian closed 3 years ago

catherine-qian commented 3 years ago

Dear authors,

I have doubts regarding the usage of the test set. As far as I understand, the 'train.py' uses (1) train set to train the model (2) validation set to select the best model Then 'synthesis.py' is used for (1) qualitative result e.g. the FGD reported in the paper, and (2) human evaluation (user study)

If so, in the line 267 of 'synthesis.py' val_data_path = 'data/ted_dataset/lmdb_val'

val_data_pathshould be = 'data/ted_dataset/lmdb_test' instead of 'data/ted_dataset/lmdb_val'?

youngwoo-yoon commented 3 years ago

No, it was intended to use the validation set. We did all the numerical experiments on the validation set, and the test set is only used for the qualitative results and user study.

catherine-qian commented 3 years ago

thanks for the reply.

We did all the numerical experiments on the validation set.

so the number reported in the paper was on validation set? e.g. Table 2

and the test set is only used for the qualitative results and user study.

the test set is only used for generating the sample results? e.g. Fig. 7

catherine-qian commented 3 years ago

Thanks again for your help!

Just to verify that (1) the model was trained and evaluated given the 4-frame seed pose, which equals to the ground truth direction vector (without 1 frame shift?) (2) for 'synthesis.py' with 'from_db_clip', the seed pose of the first clip is also the ground truth?

youngwoo-yoon commented 3 years ago

so the number reported in the paper was on validation set? e.g. Table 2

Yes.

the test set is only used for generating the sample results? e.g. Fig. 7

Yes. Fig 7 and the other figures.

(1) the model was trained and evaluated given the 4-frame seed pose, which equals to the ground truth direction vector (without 1 frame shift?)

Yes.

(2) for 'synthesis.py' with 'from_db_clip', the seed pose of the first clip is also the ground truth?

Yes, as you can see in synthesis.py, seed_seq parameter is used.

out_dir_vec = generate_gestures(args, generator, lang_model, clip_audio, clip_words, vid=vid_idx,
                                                     seed_seq=target_dir_vec[0:args.n_pre_poses], fade_out=False)