Number Training Steps for Baselines

Hi,

I am currently working on reproducing the baseline that you are proposing as plots in the robosuite whitepaper for the Wipe environment. In your whitepaper your are writing:

All agents were trained for 500 epochs with 500 steps per episode

From this I conclude that the overall number of training steps is 500*500, hence 250 000 However in your file:

https://github.com/ARISE-Initiative/robosuite-benchmark/blob/master/runs/Wipe-Panda-OSC-POSE-SEED83/Wipe_Panda_OSC_POSE_SEED83_2020_09_21_23_14_04_0000--s-0/variant.json

You state that "num_expl_steps_per_train_loop": 2500 and "num_epochs": 2000. Therefore, 2500 steps are done per epoch for 2000 epochs amounting to 2500*2000=5 000 000 training steps.

Then looking in the file with the training statistics:

https://github.com/ARISE-Initiative/robosuite-benchmark/blob/master/runs/Wipe-Panda-OSC-POSE-SEED83/Wipe_Panda_OSC_POSE_SEED83_2020_09_21_23_14_04_0000--s-0/progress.csv

Training was done for 720 epochs amounting to 1 825 800 training steps (and with 2500 steps per epoch as can be seen from the growing replay buffer).

I assume that in your whitepaper you did 500 epochs but with 2500 steps per episode. Therewith, 1 250 000 train steps overall. Can you confirm which of the numbers is actually the correct one that you propose to use?

ARISE-Initiative / robosuite-benchmark

Number Training Steps for Baselines #23