Why the first stage dataset is only 64 frames?

Mael-zys / T2M-GPT

(CVPR 2023) Pytorch implementation of “T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations”

https://mael-zys.github.io/T2M-GPT/

Apache License 2.0

595 stars 52 forks source link

Why the first stage dataset is only 64 frames? #40

Closed exitudio closed 1 year ago

exitudio commented 1 year ago

Thank you for your amazing work. I wonder why the first stage training (VAE) is using only 64 frames rather than the whole sequence.

Jiro-zhang commented 1 year ago

Hi~， Thanks for your interest in our work. First, using the full motion length requires additional strategies to train in parallel (e.g., padding), and it is time-consuming to pad all motions to the maximum motion length during training. Furthermore, CNNs can generalize well to different lengths, so 64 is enough to complete the reconstruction of motions.

exitudio commented 1 year ago

Thank you for your insight information.