Denys88 / rl_games

RL implementations
MIT License
864 stars 146 forks source link

Wandb does not seem to record time or step correctly #208

Open DanielTakeshi opened 1 year ago

DanielTakeshi commented 1 year ago

I am running PPO with wandb integration, but the statistics seem to not be recorded as intended.

I am testing this with Isaac Gym environments but I am unsure if this issue is specific to Isaac Gym.

Steps to reproduce: after installing following the IsaacGymEnvs instructions, run a command like this in the isaacgymenvs/ directory:

python train.py task=Ant headless=True wandb_activate=True wandb_entity=danieltakeshi wandb_project=isaac-gym

Where you can replace danieltakeshi with your username, and change isaac-gym to your project.

After I run this, the reward goes up (good) but I also see this on wandb:

Screenshot from 2022-10-27 13-23-04

The code is recording the reward as a function of iter, step, and time. It stores it in rl_games here:

https://github.com/Denys88/rl_games/blob/d8645b2678c0d8a6e98a6e3f2b17f0ecfbff71ad/rl_games/common/a2c_common.py#L947-L955

The code is storing the statistics with respect to different quantities (epoch, step, and time) to the self.writer which is a tensorboardX.SummaryWriter (link to docs). But the statistics on wandb seem to only show the x-axis as "iter" (which is the same as epoch_num here) and they don't show performance as a function of the step or time. Is there a way to address such an issue here?

(Also posting on the Isaac Gym repo https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/issues/87)

Denys88 commented 1 year ago

@DanielTakeshi I am sorry I missed your issue. @vwxyzjn could you take a look if you have free time?

vwxyzjn commented 1 year ago

try changing the x axis to global_step on the top right (there is a button)

DanielTakeshi commented 1 year ago

Sorry for my delayed repsonse as well, @Denys88 and @vwxyzjn.

It looks like we can adjust the x-values here:

Screen Shot 2022-12-28 at 10 40 53 AM

So I think the intended usage here is that we are supposed to adjust rewards/time so that the x-axis has Wall Time and rewards/step so that it uses global_step? (Somewhat confusingly, rewards/iter seems fine with the normal Step though it is clear in the code that iter is supposed to refer to an epoch.)

Screen Shot 2022-12-28 at 10 42 04 AM

It would be nice if there was a way to automatically set all three plots so that they use the appropriate x-axis at the start. I'm not sure if this function is available.

If this is the intended usage, feel free to close this issue report. Thanks!