DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.74k stars 1.66k forks source link

[Question] Error after loading model with FinRL #364

Closed Mahesha999 closed 3 years ago

Mahesha999 commented 3 years ago

Question

I was trying out FinRL library which uses stable-baselines3 under the hood. I encountered below error on model.load() (PPO) call. I have asked this question on FinRL here. But didnt get any response there. Possibly because its more related to stable-baselines3 (at least thats what I can deduce by looking at the below stack trace).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-94-53a21a75c23f> in <module>
----> 1 model.learn(total_timesteps = 20000, 
      2             eval_env = env_trade,
      3             eval_freq = 250,
      4             log_interval = 1,
      5             tb_log_name = '1_18_lastrun',

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/stable_baselines3/ppo/ppo.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    278     ) -> "PPO":
    279 
--> 280         return super(PPO, self).learn(
    281             total_timesteps=total_timesteps,
    282             callback=callback,

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/stable_baselines3/common/on_policy_algorithm.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    245                 logger.dump(step=self.num_timesteps)
    246 
--> 247             self.train()
    248 
    249         callback.on_training_end()

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/stable_baselines3/ppo/ppo.py in train(self)
    189                     self.policy.reset_noise(self.batch_size)
    190 
--> 191                 values, log_prob, entropy = self.policy.evaluate_actions(rollout_data.observations, actions)
    192                 values = values.flatten()
    193                 # Normalize advantage

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/stable_baselines3/common/policies.py in evaluate_actions(self, obs, actions)
    619         """
    620         latent_pi, latent_vf, latent_sde = self._get_latent(obs)
--> 621         distribution = self._get_action_dist_from_latent(latent_pi, latent_sde)
    622         log_prob = distribution.log_prob(actions)
    623         values = self.value_net(latent_vf)

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/stable_baselines3/common/policies.py in _get_action_dist_from_latent(self, latent_pi, latent_sde)
    581 
    582         if isinstance(self.action_dist, DiagGaussianDistribution):
--> 583             return self.action_dist.proba_distribution(mean_actions, self.log_std)
    584         elif isinstance(self.action_dist, CategoricalDistribution):
    585             # Here mean_actions are the logits before the softmax

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/stable_baselines3/common/distributions.py in proba_distribution(self, mean_actions, log_std)
    150         """
    151         action_std = th.ones_like(mean_actions) * log_std.exp()
--> 152         self.distribution = Normal(mean_actions, action_std)
    153         return self
    154 

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/torch/distributions/normal.py in __init__(self, loc, scale, validate_args)
     48         else:
     49             batch_shape = self.loc.size()
---> 50         super(Normal, self).__init__(batch_shape, validate_args=validate_args)
     51 
     52     def expand(self, batch_shape, _instance=None):

~/.local/share/virtualenvs/myvenv/lib/python3.8/site-packages/torch/distributions/distribution.py in __init__(self, batch_shape, event_shape, validate_args)
     51                     continue  # skip checking lazily-constructed args
     52                 if not constraint.check(getattr(self, param)).all():
---> 53                     raise ValueError("The parameter {} has invalid values".format(param))
     54         super(Distribution, self).__init__()
     55 

ValueError: The parameter loc has invalid values

Can someone give me hint / direction as to what have caused this error?

Checklist

Miffyli commented 3 years ago

I do not see how this connects to load function, unless loading is what breaks the model.

If there is a bug, please post a minimal (and runnable!) code to reproduce it. Otherwise I'd do the good old "it is their fault" and point back to FinRL library, because tests on saving/loading pass in this repository.

DishinGoyani commented 3 years ago

NaN values in Normal can cause ValueError. I guess during training, model or policy must be returning NaN in mean_actions which eventually passes into Normal so an exception occurs.

NeDa-Y commented 3 years ago

I have the same problem and also there is not any nan value, it gets error from validation data 2019

ghabdan commented 3 years ago

did anyone found a solution? same problem here. when I increase learning rate to 0.1 training works fine but if I decrease it for example to 0.01 it gives me that error

SolaWeng commented 3 years ago

I had a similar issue with a custom environment. It turns out that I had nan in my observation. here is the wrapper from stable-baselines3 to check where the nan comes from: from stable_baselines3.common.vec_env import VecCheckNan env = VecCheckNan(env, raise_exception=True)

also this is a page from the original stable-baselines showing some possible cases that cause this issue: https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html