Open wookayin opened 2 years ago
I'm submitting this as a separate PR other than #231 to ease the review process, but it'd be great if each of the PRs can be rebased/merged without creating a merge commit when being merged. Thanks!
@qstanczyk, Thank you for the update on the PR after a long time! I understand DM may not have enough resources available, but as a general request, It'd be greatly appreciated if the turnaround time for community contributions could be reduced further.
PPO agents have
reward_mean
andrewards_std
metrics logged, but SAC agents do not have.Note that the SAC implementation is not so flexible that custom metrics cannot be configured or extended (because update_step is not a method), so it would be reasonable to add them directly into the update_step function.