JinluZhang1126 / MixSTE

Official implementation of CVPR 2022 paper(MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video)
192 stars 16 forks source link

Questions about computational complexity #21

Open AJDA1992 opened 1 year ago

AJDA1992 commented 1 year ago

First I want to say thank you for releasing the code for this excellent work. I have a couple of observations/questions that hopefully you can respond to.

It appears that the reported 645 M Flops is reported on a per frame basis which would imply that for the 243 frame model the flops for a forward paass would be 645*243 M FLOPS is this correct?

I am also witnessing that to train a T=27 model on 2 3090 RTXs (48 GB of VRAM) I can only use a batch size of 200 (this is the max before Memory errors)). Can you please tell us what gpus you trained on to be able to support a batch size of 1024?

I am wondering if the small increases in performance wrt to accuracy are outweighed by the extreme computational overhead which seems to be making training almost intractable for the majority.