Pre-trained models do not reproduce paper results

una-dinosauria commented 7 years ago

Hi!

I'm using the pre-trained models available at https://drive.google.com/open?id=0B7lfjqylzqmMZlI3TUNUUEFQMXc and running generateMotionForecast.py. This produces motion predictions for different activities and models, but I've found that these do not correspond to what is reported in the paper. For reference, here's the figure from the paper that I'm talking about:

selection_007

But, for example, using the lstm3lr_walking model, the checkpoint.pikforecast_error file contains the following values:

``` T=0 2.87922000885, 0.053318887949 T=1 3.31045722961, 0.0613047629595 T=2 3.72076749802, 0.0689031034708 T=3 4.20972061157, 0.0779577866197 T=4 4.62205123901, 0.0855935439467 T=5 4.9056763649, 0.0908458605409 T=6 5.15456962585, 0.0954549908638 T=7 5.68943977356, 0.105359993875 ```

``` T=8 6.14819526672, 0.113855466247 T=9 6.47697734833, 0.119944028556 T=10 6.86927509308, 0.127208799124 T=11 7.25948381424, 0.134434878826 T=12 7.56049823761, 0.140009224415 T=13 7.60584354401, 0.140848949552 T=14 7.81918954849, 0.144799813628 T=15 7.99432945251, 0.148043140769 T=16 8.21197509766, 0.152073606849 T=17 8.22490978241, 0.152313143015 T=18 8.21773910522, 0.152180358768 T=19 8.20940303802, 0.152025982738 T=20 8.21308326721, 0.152094140649 T=21 8.08870410919, 0.14979082346 T=22 7.9909658432, 0.147980853915 T=23 7.93785572052, 0.146997332573 T=24 8.08372688293, 0.149698644876 T=25 8.17058372498, 0.151307106018 T=26 8.29908180237, 0.153686702251 T=27 8.29321861267, 0.15357811749 T=28 8.33865356445, 0.154419511557 T=29 8.29992961884, 0.153702393174 T=30 8.31999206543, 0.154073923826 T=31 8.37398910522, 0.155073866248 T=32 8.47292232513, 0.156905964017 T=33 8.59246826172, 0.159119784832 T=34 8.65988731384, 0.160368278623 T=35 8.66351318359, 0.160435423255 T=36 8.65542507172, 0.160285651684 T=37 8.70272254944, 0.161161527038 T=38 8.90265083313, 0.16486389935 T=39 9.08981990814, 0.168329998851 T=40 9.22410964966, 0.170816838741 T=41 9.25332164764, 0.171357810497 T=42 9.3009595871, 0.172239989042 T=43 9.29813861847, 0.172187745571 T=44 9.26357460022, 0.171547681093 T=45 9.19590568542, 0.170294553041 T=46 9.15723419189, 0.169578418136 T=47 9.24366569519, 0.171178996563 T=48 9.30495262146, 0.172313943505 T=49 9.25953674316, 0.171472907066 T=50 9.24114990234, 0.171132400632 T=51 9.26937294006, 0.171655058861 T=52 9.3104429245, 0.172415614128 T=53 9.19757270813, 0.170325413346 T=54 9.04441356659, 0.167489141226 T=55 8.96823406219, 0.166078403592 T=56 9.00592136383, 0.166776314378 T=57 9.09947776794, 0.168508842587 T=58 9.06608009338, 0.167890369892 T=59 9.1175775528, 0.168844029307 T=60 9.23169708252, 0.170957356691 T=61 9.25059127808, 0.171307250857 T=62 9.23868370056, 0.171086728573 T=63 9.21300506592, 0.170611202717 T=64 9.20988750458, 0.170553475618 T=65 9.30304908752, 0.172278687358 T=66 9.30745029449, 0.17236019671 T=67 9.29339599609, 0.172099933028 T=68 9.21964550018, 0.170734182 T=69 9.22905826569, 0.170908480883 T=70 9.11111068726, 0.168724268675 T=71 9.0918712616, 0.168367981911 T=72 8.92658901215, 0.165307208896 T=73 8.91659736633, 0.165122166276 T=74 8.82111263275, 0.163353934884 T=75 8.90966320038, 0.16499376297 T=76 9.02032756805, 0.167043104768 T=77 9.09782981873, 0.168478325009 T=78 9.22392463684, 0.170813426375 T=79 9.33905029297, 0.172945380211 T=80 9.31301212311, 0.172463193536 T=81 9.44260978699, 0.174863144755 T=82 9.45653438568, 0.17512100935 T=83 9.52670955658, 0.176420554519 T=84 9.64883327484, 0.178682103753 T=85 9.83387374878, 0.182108774781 T=86 9.95151329041, 0.184287279844 T=87 9.91870689392, 0.183679759502 T=88 9.91715335846, 0.18365098536 T=89 10.0150337219, 0.18546359241 T=90 9.95522022247, 0.184355929494 T=91 9.70408630371, 0.179705306888 T=92 9.56737327576, 0.1771735847 T=93 9.58298301697, 0.177462652326 T=94 9.52612495422, 0.176409721375 T=95 9.55842971802, 0.177007958293 T=96 9.53139877319, 0.176507383585 T=97 9.50600910187, 0.176037207246 T=98 9.59951972961, 0.177768886089 T=99 9.80951976776, 0.181657776237 ```

where the left and right columns correspond to skel_err and err_per_dof as computed in forecastTrajectories.py#L124

skel_err = np.mean(np.sqrt(np.sum(np.square((forecasted_motion - trY_forecasting)),axis=2)),axis=1)
err_per_dof = skel_err / trY_forecasting.shape[2]

I find one value to be much worse, and the other to be about 1 order of magnitude better. Do you have any pointers as to what I could be doing wrong?

una-dinosauria commented 7 years ago

Hey! Sorry for bothering again. I've also made some movies with these models and they definitely do not correspond to what is shown in the official video of the paper -- maybe you didn't upload the final final models?

asheshjain399 commented 7 years ago

The final models are here: https://drive.google.com/open?id=0B7lfjqylzqmMZlI3TUNUUEFQMXc (same link as above). The numbers is Table 1 are Euler angle errors, and not exponential map error. I think you are outputting exponential map errors.

The model are trained on exponential map representation of joints, the output is then converted to Euler angle representation for visualization and quantitative comparison.

asheshjain399 commented 7 years ago

You should look into the Utils directory. It has some Matlab scripts that the do the conversion for you (Sorry, Utils is not documented yet).

una-dinosauria commented 7 years ago

Thanks a lot. I looked into the utils directory and found this motionGenerationError.m file that computes error with the conversion expmap->rotmat->euler. When I run this on the generated motion of pre-trained models, I get the following errors:

erd walking        [0.93 1.18 1.59 1.97 2.24 ]
lstm3lr walking    [0.77 1.00 1.29 1.74 1.84 ]
srnn walking       [0.81 0.94 1.16 1.48 1.78 ]
erd eating         [1.27 1.45 1.66 1.95 2.02 ]
lstm3lr eating     [0.89 1.09 1.35 1.66 1.97 ]
srnn eating        [0.97 1.14 1.35 1.62 2.09 ]
erd smoking        [1.66 1.95 2.35 2.63 3.61 ]
lstm3lr smoking    [1.34 1.65 2.04 2.30 2.59 ]
srnn smoking       [1.45 1.68 1.94 2.24 2.64 ]
erd discussion     [2.27 2.47 2.68 2.92 3.16 ]
lstm3lr discussion [1.88 2.12 2.25 2.33 2.45 ]
srnn discussion    [1.22 1.49 1.83 2.07 2.24 ]

These results are a bit better than those reported in Table 1 :) -- Do you have an idea of what could be causing the discrepancy? I've noticed that the code ignores the global rotation and translation (e.g. motionGenerationError.m#L35 sets them to zero); I experimented with setting only the global rotation to zeros and I get slightly worse results, but still better than those in Table 1. However, if I completely comment that line (e.g., I add global rotation and global rotation), I get the following results:

erd walking        [4.69 11.71 33.38 62.81 106.05 ]
lstm3lr walking    [4.30 10.24 29.06 51.01 83.57 ]
srnn walking       [6.94 15.12 32.33 57.26 89.95 ]
erd eating         [4.87 10.07 17.09 27.64 38.08 ]
lstm3lr eating     [6.24 12.47 22.04 40.35 82.21 ]
srnn eating        [5.05 9.32 14.75 23.29 35.91 ]
erd smoking        [4.20 7.77 15.42 31.38 51.39 ]
lstm3lr smoking    [3.75 7.26 14.21 21.88 31.83 ]
srnn smoking       [4.44 8.15 14.34 22.08 31.76 ]
erd discussion     [5.95 13.99 33.42 59.98 111.36 ]
lstm3lr discussion [9.55 20.88 42.32 62.46 82.69 ]
srnn discussion    [9.40 19.81 39.03 59.83 99.46 ]

Which are definitely much worse.

Thanks again for getting back to me; we seem to be getting closer to reproducing the results in the paper.

una-dinosauria commented 7 years ago

As a side note, I'm assuming that everything is at 25fps, right? Since you have 8x less data than what can be downloaded from human3.6m, and that is sampled at 200fps. Hence, in the error vector I'm using the indices [2,4,8,14,25] which correspond to [80, 160, 320, 560 and 1000] milliseconds.

asheshjain399 commented 7 years ago

The errors reported in table 1 only include the Euler angles, and does not include the global translation and rotation errors. This similar to Fragkiadaki et al. ICCV'15.

We used mocap data at 100hz (down sampled by 2) and not at 25Hz. The reason you see less data is because we don't use all the data from human3.6. The details on the sequences we used can be found in the experiment section of the paper. Just to reiterate, our experiment settings (to the best of our effort) are very similar to Fragkiadaki et al.

Seleucia commented 7 years ago

Hello @asheshjain399 , Thank you very much for releasing code and pre-trained models. I'm trying to reproduce the your results but i couldn't manage it.

I see that you are normalizing the data, so prediction is also normalised. are you computing error over normalized data? or are you unnormalizing your prediction with data statistics.

I used the motionGenerationError.m file to generate error, but it seems it is expecting the prediction vector should be 99 dimensional, but on the other hand code produces 54 dimensional vector. I modify motionGenerationError.m file to handle this but i'm not sure if that is correct way or not.

Another thing is that i see that you are computing direct 2d L2 loss between each angles, shouldn't it be 3d loss between joints with this your error will be considerebly less.

pvmilk commented 6 years ago

@una-dinosauria I am trying to reproduce your result by modifying motionGenerationError.m to calculate error from _forecast_Nn and _ground_truth_forecast_Nn, n in [0, 23].

lstm3lr walking [0.77 1.00 1.29 1.74 1.84 ]

Note : I believe that this is the same value that appears in your paper (cvpr2017; on human motion prediction using RNN).

However, I got the number that is very different from your result and the result in srrn paper.

lstm3lr walking [7.7294, 8.7923, 8.7971, 9.2380, 9.1237]

The only modification I done on motionGenerationError.m is

Considering all data (24, instead of 7) | motionGenerationError.m#L18
Considering only 54 features (instead of 100) | motionGenerationError.m#L31 and motionGenerationError.m#L47
Filename to read from | motionGenerationError.m#L20 and motionGenerationError.m#L24 and motionGenerationError.m#L40 and motionGenerationError.m#L43

And other differences could be

The code is running on octave, rather than matlab
- (only warning) warning: RNNexp/structural_rnn/CRFProblems/H3.6m/mhmublv/Motion/RotMat2Euler.m: possible Matlab-style short-circuit operator at line 34, column 16
The prediction was running using Theano 0.9.0

Am I missing something here? e.g. unnormalized that data, consider only n in [0,7].

@Seleucia So you manage to solve your issue of reproducing the results from srnn paper using a pre-trained model?

Thank you.

una-dinosauria commented 6 years ago

I did manage to get the numbers that I reported, and I remember them being reasonably close to what the SRNN paper reports. Have you made movies for your predictions? If I remember correctly, the movie for discussion was exactly was is shown in the official SRNN movie, but I never managed to get the other ones.

I'm currently away at a conference but if you make your branch public I can look at the code once I get back to the lab (and make a diff with my code to see if there's something noticeably different).

Seleucia commented 6 years ago

Yes, i manage to get exactly same number given at the srnn paper with pre-trained models. I tthink confusion here is related with the subsampling. @una-dinosauria is right that given data here is 25ps not 100fps.

pvmilk commented 6 years ago

@una-dinosauria No, I haven't made movies for the predictions yet. Let's me try a couple things on my own. If it is still not working still, I will ask you a favor for a diff.

@Seleucia Do you make any change to the source code more than motionGenerationError.m as mentioned earlier (99->54)? Can you also elaborate more on the subsampling issue?

Thank you.

Seleucia commented 6 years ago

I did not make any changes except that I mention here. Subsampling issue was related with the selected frames, SRNN paper is reporting the frames: [8, 16, 32, 56, 100], not the one @una-dinosauria said: [2,4,8,14,25]. SRNN paper assumes that they subsampled by 2, on the other @una-dinosauria paper assuming that subsampled by 8. I think @una-dinosauria is right, given time at SRNN paper is wrong.

pvmilk commented 6 years ago

@una-dinosauria I tried to duplicate your result as mentioned, but without success. Could you have a look into it when you have time?

Thank you.

Here is what I did and my result:

1.) I use both srnn branch of both RNNexp (@3ba986b) and NeuralModels (@fb02335).

2.) Changes I made is to make the program run, and they can be found in patch_srnn.txt

3.) I download the data and pre-trained model, then forecase the motion with

$ python generateMotionForecast.py lstm3lr `datapath`/pre-trained/lstm3lr_walking/

4.) I calculate the error using matlab script

$ octave

octave:1> merr = motionGenerationError('`datapath`/pre-trained/lstm3lr_walking/');

(I actually use octave here, also I need to download H3.6m visualize code version 1.1 and extract it under RNNexp/structural_rnn/CRFProblems/H3.6m/h36devkit folder).

lstm3lr walking

merr([2,4,8,14,25]) = 3.0110 3.9911 4.8584 6.8636 7.1127

Below is the error value for a 100 predicted frames.

``` merr = 2.5716 3.0110 3.5399 3.9911 4.3237 4.6516 4.5892 4.8584 5.4216 5.6991 6.5395 7.0668 7.1334 6.8636 7.0044 7.0685 7.9258 7.9086 7.5250 7.8908 7.2020 7.1370 7.3037 7.1200 7.1127 7.0545 7.2075 7.1480 7.2220 7.1470 6.9809 7.0547 7.2562 7.2793 7.2724 7.2901 7.2155 6.9784 7.2433 7.0532 7.4498 7.2638 7.2666 7.6310 7.5034 7.2594 7.4710 7.3735 7.4623 7.0478 6.8761 6.9157 6.8739 6.9698 6.6872 6.9685 7.0161 6.8627 6.8614 6.8071 6.7301 6.9461 6.6581 6.6281 6.8499 7.2705 7.5901 7.7002 7.4472 7.4562 7.5396 7.4184 7.1077 6.9915 6.7552 6.6909 6.5945 6.6490 7.0078 7.3325 7.2949 7.3203 7.5912 7.4449 7.7315 7.7443 7.5951 7.7246 7.5485 7.5036 7.3329 7.3004 7.2497 7.2188 7.4018 7.7028 7.6853 8.0556 8.3116 8.2240 ```

pvmilk commented 6 years ago

@una-dinosauria I think I got it already. The output of the prediction from generateMotionForecast.py needed to be unnormalised before calculating the error with motionGenerationError.m.

There is a unnormalised method provide in unNormalizeData.py, but you would need to modify the source code to do it yourself.

For those who is following the thread, I will provide the patch once I clean my code.

Thank you.

pvmilk commented 6 years ago

As promised, please replace the following patch srnn_patch.txt in step 2.) I provided above.

With this, you should be able to reproduce the same/similar result as the Structural-RNN for lstm3lr and erd case.

lstm3lr walking

merr([8,16,32,56,100]) =   1.1697 1.4747 1.6444 1.7967 2.1886

erd walking

merr([8,16,32,56,100]) =  1.3010 1.5636 1.8428 2.005 2.3858

Please note that if I used merr([2,4,8,14,25]) the different is slightly better than the one report in the baseline paper (cvpr2017; on human motion prediction using RNN).

lstm3lr walking

merr([2,4,8,14,25]) =   0.67755 0.88913 1.16974 1.41097 1.59932

erd walking

merr([2,4,8,14,25]) =   0.85603 1.04604 1.30096 1.52555 1.71511

UPDATE (9 August 2017): For those who also tried to duplicate the result for other action (eating, smoking, discussion), you may need to look into the parameters 'actions' in 'RNNexp/structural_rnn/CRFProblems/H3.6m/processdata.py'

MAtthewGHuser commented 3 years ago

Thanks a lot. I looked into the utils directory and found this motionGenerationError.m file that computes error with the conversion expmap->rotmat->euler. When I run this on the generated motion of pre-trained models, I get the following errors:
erd walking        [0.93 1.18 1.59 1.97 2.24 ]
lstm3lr walking    [0.77 1.00 1.29 1.74 1.84 ]
srnn walking       [0.81 0.94 1.16 1.48 1.78 ]
erd eating         [1.27 1.45 1.66 1.95 2.02 ]
lstm3lr eating     [0.89 1.09 1.35 1.66 1.97 ]
srnn eating        [0.97 1.14 1.35 1.62 2.09 ]
erd smoking        [1.66 1.95 2.35 2.63 3.61 ]
lstm3lr smoking    [1.34 1.65 2.04 2.30 2.59 ]
srnn smoking       [1.45 1.68 1.94 2.24 2.64 ]
erd discussion     [2.27 2.47 2.68 2.92 3.16 ]
lstm3lr discussion [1.88 2.12 2.25 2.33 2.45 ]
srnn discussion    [1.22 1.49 1.83 2.07 2.24 ]
These results are a bit better than those reported in Table 1 :) -- Do you have an idea of what could be causing the discrepancy? I've noticed that the code ignores the global rotation and translation (e.g. motionGenerationError.m#L35 sets them to zero); I experimented with setting only the global rotation to zeros and I get slightly worse results, but still better than those in Table 1. However, if I completely comment that line (e.g., I add global rotation and global rotation), I get the following results:
erd walking        [4.69 11.71 33.38 62.81 106.05 ]
lstm3lr walking    [4.30 10.24 29.06 51.01 83.57 ]
srnn walking       [6.94 15.12 32.33 57.26 89.95 ]
erd eating         [4.87 10.07 17.09 27.64 38.08 ]
lstm3lr eating     [6.24 12.47 22.04 40.35 82.21 ]
srnn eating        [5.05 9.32 14.75 23.29 35.91 ]
erd smoking        [4.20 7.77 15.42 31.38 51.39 ]
lstm3lr smoking    [3.75 7.26 14.21 21.88 31.83 ]
srnn smoking       [4.44 8.15 14.34 22.08 31.76 ]
erd discussion     [5.95 13.99 33.42 59.98 111.36 ]
lstm3lr discussion [9.55 20.88 42.32 62.46 82.69 ]
srnn discussion    [9.40 19.81 39.03 59.83 99.46 ]
Which are definitely much worse.

Thanks again for getting back to me; we seem to be getting closer to reproducing the results in the paper.

Hi, I also try to reproduce the code. And I use motionGenerationError.m to convert the data expmap->rotmat->euler. But the result what I got is like that:

srnn walking       [4.57 5.12 5.95 6.04 7.43 ]    (skel_err)
                           [0.10 0.11 0.12 0.13 0.15 ]    (err_per_dof)

It's different from your result

srnn walking       [0.81 0.94 1.16 1.48 1.78 ]

And it is also different from results in the paper.

So maybe I have a look at your reproduce code? I appreciate you so much.

asheshjain399 / RNNexp

Pre-trained models do not reproduce paper results #6