easylearningscores / PastNet

MM'2024
4 stars 0 forks source link

How does the VQ-VAE reduce the training cost of the proposed method? #1

Open bigfeetsmalltone opened 1 year ago

bigfeetsmalltone commented 1 year ago

Interesting work! However, in the DST module, the encoded feature maps with the shape of [T, C, \hat{H}, \hat{W}] is quantified into feature map with the shape of [T, D, \hat{H}, \hat{w}]. It is really confusing since the shape of the two feature maps is actually the same. Hence, this is no reduced computation cost. Hope you can answer the issue.

easylearningscores commented 1 year ago

Apologies for my delayed response, and thank you for carefully reading our paper. Even though the resolution of the feature maps has not changed, the reduction in the dimensionality from C to D has greatly decreased the computational overhead. Furthermore, we've mapped continuous input data onto a finite, discrete set of encodings, associating each part of the input data with the nearest embedding vector from a codebook. Since the size of the codebook is fixed, each part of the encoding can be represented using fewer bits. This significantly reduces the amount of information that needs to be stored and the computational resources required to process this information.