JinluZhang1126 / MixSTE

Official implementation of CVPR 2022 paper(MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video)
192 stars 16 forks source link

About batch size (1024) #4

Closed shawnpark07 closed 1 year ago

shawnpark07 commented 2 years ago

Hello, thanks for your impressive work :)

While comprehending your work, here comes one ambiguity.

As you know, H36M train dataset has total 1,559,752 frames, and these would be grouped into 6,716 sequences (243 frames / sequence). But if we choose batch size of 1,024 as mentioned in the paper, there would be only 6~7 batches per one epoch.

Did I understand right? If so, I think the training would be very unstable due to small number of batches per epoch. Will looking forward to your answer. Thanks!

JinluZhang1126 commented 2 years ago

Hi, 1024 means frame numbers of each batch, we apply stride sample to separate 1024 into 5 periods, therefore the overall iterations will be about 1300 in our code.

flyyyyer commented 1 year ago

Your paper said the bachsize is 1024, but actually batchsize is 1024//243=4 in the tranning. It seems to reduce the batchsize to achive large iterations.

JinluZhang1126 commented 1 year ago

Your paper said the bachsize is 1024, but actually batchsize is 1024//243=4 in the tranning. It seems to reduce the batchsize to achive large iterations.

Yes, the batch size is actually 4, and the input sequence length is 1024, which may be confused.

Aerikervalid commented 1 year ago

If I increase batchsize does that mean the effect is increased and the model is more stable,But the large bachsize was worse in several experiments. Why do the authors think this is?