Toni-SM / skrl

Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab
https://skrl.readthedocs.io/
MIT License
443 stars 43 forks source link

Mean rewards are not calculated properly #157

Open nikolaradulov opened 2 weeks ago

nikolaradulov commented 2 weeks ago

Description

The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards)). Then every time the data is mean to be written the mean of all the rewards store in self.tracking_data["Reward / Total reward (mean)"] is written self.writer.add_scalar(k, np.mean(v), timestep) and the tracking data is cleared. The issue is that points are appended every time there is data inside self._track_rewards the cumulative rewards storage. This results all the cumulative rewards that have been added in storage since the last write to be meaned, added to the tracking data and meaned again on write.

eg. say each episode is 3 steps, only 1 env instance running. Writing done every 9 steps

step1: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[] step2: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[] step3: Episode finishes with cumulative reward -30: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30] step4: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30] step5: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30] step6: Episode finished with cumulative reward -4: step3: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17] step7: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17] step8: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -17] step9 : Episode finished with reward -10: self._track_rewards = [-30, -4, -10] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -22]

At the end of step 9 the mean cumulative reward of the past 3 episodes is 14.(6) . The cumulative reward that is being written to tensorboard is -29.2 VERY DIFFERENT

SOLUTION: self._track_rewards.clear() after every time data is added to self.tracking_data["Reward / Total reward (mean)"]

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch

Additional system information

No response