PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations
Apache License 2.0
19 stars 4 forks source link

Clear up where we use `use_reward_terms` and `use_auto_norm` #483

Closed slinlee closed 2 years ago

slinlee commented 2 years ago

We started originally by saying that we would apply auto-norm if there are reward terms and we pass in alpha weights.

That has changed and we can pass in reward terms (even single ones) where we don't want to apply reward normalization. use_auto_norm is it's own param now.

Let's clear these up https://github.com/SkymindIO/nativerl/blob/6b7ca8936c4f0ea2865d5149bd54da233faa5815/nativerl/python/pathmind_training/environments.py#L301


nativerl/python/run.py:
  128      env_config = {
  129:         "use_reward_terms": alphas is not None,
  130          "reward_balance_period": reward_balance_period,

  137  
  138:     if env_config["use_reward_terms"]:
  139          assert (

  190          callbacks = get_callbacks(
  191:             debug_metrics, env_config["use_reward_terms"], is_gym, checkpoint_frequency
  192          )

nativerl/python/pathmind_training/callbacks.py:
  22  
  23: def get_callbacks(debug_metrics, use_reward_terms, is_gym, checkpoint_frequency):
  24      class Callbacks(DefaultCallbacks):

  50  
  51:                 if use_reward_terms:
  52                      term_contributions = (

nativerl/python/pathmind_training/environments.py:
  119  
  120:             self.use_reward_terms = env_config["use_reward_terms"]
  121              self.num_reward_terms = env_config["num_reward_terms"]

  261                      obs_dict[str(i)] = obs
  262:                     if self.use_reward_terms:
  263                          reward_array = np.array(self.nativeEnv.getRewardTerms(i))

  278  
  279:                 if self.use_reward_terms and done_dict["__all__"]:
  280                      self.term_contributions += sum(

  300  
  301:                 if self.use_reward_terms:
  302                      reward_array = np.array(self.nativeEnv.getRewardTerms())

  336          def updateBetas(self, betas):
  337:             if self.use_reward_terms:
  338                  self.betas = betas

  340          def getRewardTermContributions(self):
  341:             if self.use_reward_terms:
  342                  return self.term_contributions```