ai4r / Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Other
243 stars 35 forks source link

text padding #9

Closed catherine-qian closed 3 years ago

catherine-qian commented 3 years ago

Dear authors,

Thanks for your contribution. I have one query about the text padding. In the paper (ACM trans on Graphics), you mentioned you add padding tokens to make the word sequence the same length of gestures (34 frames).

  1. Could I know how do you decide to add the tokens?
  2. And for 34 frames, will the word embedding be repeated for some frames if the spoken word duration is longer than 1/15 sec?
youngwoo-yoon commented 3 years ago

Please see here: https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context/blob/6a02bcaefc678e9e6170ca904dd9d372dd6151ef/scripts/data_loader/lmdb_data_loader.py#L134

I first created extended_word_indices filled with PAD TOKEN and replaced some places with word tokens. There is no special handling for a long word, but I believe repeating tokens is worth to try.