[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated)

Describe the bug In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:

https://github.com/araffin/robotics-rl-srl/blob/1ab1bd366825f98f0282d05e32a3de0cbf7f0f9a/replay/gather_results.py#L136-L140

@kalifou has already confirmed this problem.

Explanation It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since run_acc is an array of dtype int64.

Code example The problem actually comes from numpy. The following code reproduces this phenomenon:

import numpy as np
A = np.arange(10, dtype=np.int64)
print(A[:]/10) # np.array([0.0, 0.1, ..., 0.9])
A[:] = A[:] / 10 
print(A) # np.array([0, 0, ..., 0])

One funny thing is that A = A / 10 works (not as "floor"), but not A[:] = A[:] / 10.

Solution

Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)

The following picture is a demo of my code:

when there is only one experiment: no confidence interval
when there are several experiments: 95% confidence interval is estimated
when RL training of a SRL model was stopped accidentally, a "-" will be put
don't need to specify the SRL models, the folders are searched automatically.
the "checkpoints" [1e6, 2e6, 3e6, 4e6, 5e6] can be changed by user. (put M for million, K for thousand)
save the result to .tex file (Latex table).

Question

Are there similar problems elsewhere in the toolbox ?

araffin / robotics-rl-srl

[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

Question