Vegetebird / StridedTransformer-Pose3D

[TMM 2022] Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation
MIT License
335 stars 37 forks source link

evaluation on those frames at the begging or the end of one video #1

Closed Albertchen98 closed 2 years ago

Albertchen98 commented 2 years ago

How did you deal with the situation when the target frame is the first one of the video? then there isn't any perceding frames to make the target frame as the 'center frame'. Or you just ignore it and start the evaluation from the 13th frame when the input sequence length is 27?

Vegetebird commented 2 years ago

Following VideoPose3D and ST-GCN, in the data preprocessing, if the target frame is the first one of the video, we pad with the edge values of array. You can refer to https://github.com/Vegetebird/StridedTransformer-Pose3D/blob/163f0cb4869ef51d7545b4c3d04ce3491a2b67e2/common/generator.py#L103.

Albertchen98 commented 2 years ago

thank you very much for the quick reply~

I found another interesting thing that the max-pooling only extracts value each stride_num[i] steps because the kernel size is set to 1, there isn't any implict 'max-pooling' operation conducted. Have you ever tried kernel size 3 and got any performance discrepancy? https://github.com/Vegetebird/StridedTransformer-Pose3D/blob/163f0cb4869ef51d7545b4c3d04ce3491a2b67e2/model/block/strided_transformer_encoder.py#L65

Vegetebird commented 2 years ago

I forget whether I have tried it, maybe you can try it~