Thanks for your contribution.
I have one query about the text padding.
In the paper (ACM trans on Graphics), you mentioned you add padding tokens to make the word sequence the same length of gestures (34 frames).
Could I know how do you decide to add the tokens?
And for 34 frames, will the word embedding be repeated for some frames if the spoken word duration is longer than 1/15 sec?
I first created extended_word_indices filled with PAD TOKEN and replaced some places with word tokens.
There is no special handling for a long word, but I believe repeating tokens is worth to try.
Dear authors,
Thanks for your contribution. I have one query about the text padding. In the paper (ACM trans on Graphics), you mentioned you add padding tokens to make the word sequence the same length of gestures (34 frames).