PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations
Apache License 2.0
19 stars 4 forks source link

WIP - Refactor reward terms params #485

Closed slinlee closed 2 years ago

slinlee commented 2 years ago

WIP description - refactoring the params for reward terms. We always use reward terms, the difference now is if we should apply auto-norm now.

The big change is here, and the rest follows from removing some params: https://github.com/SkymindIO/nativerl/commit/5af9d522554fd16dc0c6a7acc3af24d3931d1955#diff-6a83e83402a63a80857e3c3a49bb5120f23e4eac02fc6523dc80fe612dae8754R127-R133

alphas is optional here. The number of reward terms is determined by the env itself, querying reward_terms() use_auto_norm is kept separate and can only be true if the number of terms is > 1

This needs from the webapp side: https://github.com/SkymindIO/pathmind-webapp/pull/3679

Todo:

slinlee commented 2 years ago

I'm going to split this into two PR. one for refactoring use_reward_terms and use_auto_norm, and a separate one where I make alphas optional.