Describe the bug
In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:
Explanation
It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since run_acc is an array of dtype int64.
Code example
The problem actually comes from numpy. The following code reproduces this phenomenon:
One funny thing is that A = A / 10 works (not as "floor"), but not A[:] = A[:] / 10.
Solution
Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)
The following picture is a demo of my code:
when there is only one experiment: no confidence interval
when there are several experiments: 95% confidence interval is estimated
when RL training of a SRL model was stopped accidentally, a "-" will be put
don't need to specify the SRL models, the folders are searched automatically.
the "checkpoints" [1e6, 2e6, 3e6, 4e6, 5e6] can be changed by user. (put M for million, K for thousand)
save the result to .tex file (Latex table).
Question
Are there similar problems elsewhere in the toolbox ?
Describe the bug In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:
https://github.com/araffin/robotics-rl-srl/blob/1ab1bd366825f98f0282d05e32a3de0cbf7f0f9a/replay/gather_results.py#L136-L140
@kalifou has already confirmed this problem.
Explanation It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since
run_acc
is an array of dtype int64.Code example The problem actually comes from numpy. The following code reproduces this phenomenon:
One funny thing is that
A = A / 10
works (not as "floor"), but notA[:] = A[:] / 10
.Solution
Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)
The following picture is a demo of my code:
Question
Are there similar problems elsewhere in the toolbox ?