We started originally by saying that we would apply auto-norm if there are reward terms and we pass in alpha weights.
That has changed and we can pass in reward terms (even single ones) where we don't want to apply reward normalization. use_auto_norm is it's own param now.
We started originally by saying that we would apply auto-norm if there are reward terms and we pass in alpha weights.
That has changed and we can pass in reward terms (even single ones) where we don't want to apply reward normalization.
use_auto_norm
is it's own param now.Let's clear these up https://github.com/SkymindIO/nativerl/blob/6b7ca8936c4f0ea2865d5149bd54da233faa5815/nativerl/python/pathmind_training/environments.py#L301