FraLuca / STSGCN

Repository for "Space-Time-Separable Graph Convolutional Network for Pose Forecasting" (ICCV 2021)
MIT License
91 stars 25 forks source link

I failed to get the prediction results in the paper #12

Closed 705062791 closed 2 years ago

705062791 commented 2 years ago

I failed to get the prediction results in the paper, while used the proposeed code and pretrained model. I speculate that a possible reason is: Using my own test code, due to its absence in the public code.

My test method is as follows: Calculate the average 3D joints error in each action per time step as shown in paper. For instance, In 'walking’, I calculate the mean 3D error at 40ms, 160ms, 320ms and 400ms. The perdiction errors are significantly worse than the results published in the paper.

To address above issue, I think the best way is to publish your test code in the paper.

FraLuca commented 2 years ago

Hi,

Thanks for your interest! In the readme markdown file, there are all the instructions to run the test easily. In particular an example could be:

python main_h36_3d.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 22 --mode test --model_path ./checkpoints/CKPT_3D_H36M

I want to stress --mode test setting that switches the model to test mode. Let me know if you can solve it.

Luca

705062791 commented 2 years ago

Hi Thank you for your reply. I did notice the content in the README, and I also used the command you mentioned to run the code.

However, the test part only prints the average 3D error of each action, but I expect to get the frame by frame 3D error as given in your paper table1.

So, I modified the test function to calculate the error frame by frame, the result is: 40ms 160ms 320ms 400ms 16.2550, 28.1007, 54.4291, 65.8412

And in your paper, it is: 40ms 160ms 320ms 400ms 10.1 17.1 33.1 38.3

I doubt if my test code is wrong, so I hope you can provide the test code to calculate the frame-by-frame error instead of the action mean error.

Looking forward to your reply!

Tiezheng Ma

FraLuca commented 2 years ago

In Table 1 (short-term) and Table 2 (long-term) we show the MPJPE for each action. This metric considers the error in all the previous frames w.r.t. a considered one and averaging them. For example, if we consider the prediction to be 80 msec (2 frames) we run the error for frame 1 and frame 2 and then we average them. This gives the results in the tables you are referring to. Moreover, we adopt this test function/metric coherently with the standard procedure of the literature (in order to be comparable with previous works). In particular, you can refer to the baseline model of Wei Mao History Repeat Itself.

Luca

research-lover commented 2 years ago

Hi,

I think previous methods evaluate on the specific frames not the average over all previous frames.

Please refer to the evaluation code of Residual.sup and Learning Trajectory Dependencies.

Thanks

Research Lover

FraLuca commented 2 years ago

Hi Research Lover,

Thank you for your point. We'll double-check and, in case, take action. We'll let you know soon.

Thanks, Luca

Dean-UQ commented 2 years ago

Hi, It is true that this work did not follow the common evaluation setup used in other related works. Hence, it is an unfair to compare its performance with other SOTA methods.

FraLuca commented 2 years ago

Hi, We found the problem in the evaluation metric. We will provide the correct tables and results soon, here and in the paper. Thanks for your help.

Maradowei commented 2 years ago

Hi, I don't understand the eq::6 and eq::7, why the k is from 1 to T+K, rather than T to T+K, and the denominator same. I mean why the input are under the considering for Loss. Good work/paper, by the way.

whf9527 commented 2 years ago

How should I get AMASS data from this website image

FraLuca commented 2 years ago

Sorry for only replying now.

The problem arises because no prior human pose forecasting work has explicitly written the test MPJPE metric. [Mao et al., 2020, Mao et al., 2019] have specified the MPJPE for the learning loss, and they have referred to the (same) MPJPE for testing, which is however different.

In [Mao et al., 2020], Eq. (6), they define MPJPE as

$$MPJPE = \frac{1}{J(M+T)}\sum{t=1}^{M+T} \sum{j=1}^J ||\hat{\textbf{p}}{t,j} - \textbf{p}{t,j} ||^2,$$

which sums up all errors at all frames up to the prediction T.

Also in [Ionescu et al., 2014], Eq. (8), they define the MPJPE as:

$$MPJPE(t) = \frac{1}{J} \sum{j=1}^J ||\hat{\textbf{p}{t,j} }- \textbf{p}_{t,j} ||^2,$$

and they state: "For a set of frames the error is the average over the MPJPEs of all frames."

We have therefore interpreted the test MPJPE to be:

$$MPJPE = \frac{1}{J T}\sum{t=M+1}^{M+T} \sum{j=1}^J ||\hat{\textbf{p}}{t,j} - \textbf{p}{t,j} ||^2,$$

which is implemented in our testing code. Note: coding has been done in good faith, and in good faith we have open-sourced the project here.

As noted in this thread, the code provided by [Mao et al., 2020] actually considers only the target temporal horizon, not the average up to that time.

Running the test code of [Mao et al., 2020], short-term (400ms) and long-term (1000ms) errors for the Human3.6M dataset for STS-GCN are:

The main page in this repository will report this performance and specify the test MPJPE error, to avoid future discrepancies.

image

FraLuca commented 2 years ago

Let us close this GitHub issue because the previous post clarified the discrepancy between the two versions of the error metric. We wish to thank those who pointed out the issue: it is always a good thing when we can clarify a mistake. We are happy that having open-sourced the project has enabled it.