Closed slinlee closed 2 years ago
WIP description - refactoring the params for reward terms. We always use reward terms, the difference now is if we should apply auto-norm now.
The big change is here, and the rest follows from removing some params: https://github.com/SkymindIO/nativerl/commit/5af9d522554fd16dc0c6a7acc3af24d3931d1955#diff-6a83e83402a63a80857e3c3a49bb5120f23e4eac02fc6523dc80fe612dae8754R127-R133
alphas is optional here. The number of reward terms is determined by the env itself, querying reward_terms() use_auto_norm is kept separate and can only be true if the number of terms is > 1
alphas
reward_terms()
use_auto_norm
This needs from the webapp side: https://github.com/SkymindIO/pathmind-webapp/pull/3679
Todo:
I'm going to split this into two PR. one for refactoring use_reward_terms and use_auto_norm, and a separate one where I make alphas optional.
use_reward_terms
WIP description - refactoring the params for reward terms. We always use reward terms, the difference now is if we should apply auto-norm now.
The big change is here, and the rest follows from removing some params: https://github.com/SkymindIO/nativerl/commit/5af9d522554fd16dc0c6a7acc3af24d3931d1955#diff-6a83e83402a63a80857e3c3a49bb5120f23e4eac02fc6523dc80fe612dae8754R127-R133
alphas
is optional here. The number of reward terms is determined by the env itself, queryingreward_terms()
use_auto_norm
is kept separate and can only be true if the number of terms is > 1This needs from the webapp side: https://github.com/SkymindIO/pathmind-webapp/pull/3679
Todo: