amirbar / speech2gesture

code for training the models from the paper "Learning Individual Styles of Conversational Gestures"
361 stars 41 forks source link

How can you run your network on arbitrary audio durations at test time? #8

Closed WinstonDeng closed 4 years ago

WinstonDeng commented 4 years ago

"During training, we take as input spectrograms corresponding to about 4 seconds of audio and predict 64 pose vectors, which correspond to about 4 seconds at a 15Hz frame-rate. At test time we can run our network on arbitrary audio durations"(Section 4.3).

What are the details of the testing implementation?

amirbar commented 4 years ago

How can you run your network on arbitrary audio durations at test time?

To do this, we input the audio as is (+padding) and then resize a hidden layer such that it matches the required pose sequence length shape.

For more details, please refer to the test script of our code, which is compatible with arbitrary audio length: https://github.com/amirbar/speech2gesture/blob/master/audio_to_multiple_pose_gan/predict_audio.py

WinstonDeng commented 4 years ago

Thanks for your reply!