AI4Finance-Foundation / FinRL-Tutorials

Tutorials. Please star.
https://ai4finance.org
MIT License
848 stars 349 forks source link

Error while trying to implement basic Optuna tutorial for SAC #89

Open arun-dezerv opened 5 months ago

arun-dezerv commented 5 months ago

https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/4-Optimization/FinRL_HyperparameterTuning_using_Optuna_basic.ipynb

Hello - While the above tutorial works well for DDPG, I am unable to recreate the same for SAC. While the tutorial works well for 50000 timesteps for DDPG, the same starts failing when the number of timesteps is more than 100 for SAC. When recreating for SAC with anything more than 100 timesteps, I get the below error:

[I 2024-06-01 11:43:00,899] A new study created in memory with name: sac_study

:4: FutureWarning: suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead. learning_rate = trial.suggest_loguniform("learning_rate", 1e-1, 1) {'buffer_size': 100000, 'learning_rate': 0.3968793330444371, 'batch_size': 256} Using cpu device [W 2024-06-01 11:43:06,313] Trial 0 failed with parameters: {'buffer_size': 100000, 'learning_rate': 0.3968793330444371, 'batch_size': 256} because of the following error: ValueError('Expected parameter loc (Tensor of shape (1, 30)) of distribution Normal(loc: torch.Size([1, 30]), scale: torch.Size([1, 30])) to satisfy the constraint Real(), but found invalid values:\ntensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan]])'). Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial value_or_values = func(trial) File "", line 63, in objective trained_sac = agent.train_model(model=model_sac, File "/usr/local/lib/python3.10/dist-packages/finrl/agents/stablebaselines3/models.py", line 117, in train_model model = model.learn( File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/sac.py", line 307, in learn return super().learn( File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 328, in learn rollout = self.collect_rollouts( File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 557, in collect_rollouts actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 390, in _sample_action unscaled_action, _ = self.predict(self._last_obs, deterministic=False) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/base_class.py", line 556, in predict return self.policy.predict(observation, state, episode_start, deterministic) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/policies.py", line 368, in predict actions = self._predict(obs_tensor, deterministic=deterministic) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/policies.py", line 353, in _predict return self.actor(observation, deterministic) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/policies.py", line 170, in forward return self.action_dist.actions_from_params(mean_actions, log_std, deterministic=deterministic, **kwargs) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 190, in actions_from_params self.proba_distribution(mean_actions, log_std) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 224, in proba_distribution super().proba_distribution(mean_actions, log_std) File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 164, in proba_distribution self.distribution = Normal(mean_actions, action_std) File "/usr/local/lib/python3.10/dist-packages/torch/distributions/normal.py", line 56, in __init__ super().__init__(batch_shape, validate_args=validate_args) File "/usr/local/lib/python3.10/dist-packages/torch/distributions/distribution.py", line 68, in __init__ raise ValueError( ValueError: Expected parameter loc (Tensor of shape (1, 30)) of distribution Normal(loc: torch.Size([1, 30]), scale: torch.Size([1, 30])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]]) [W 2024-06-01 11:43:06,315] Trial 0 failed with value None. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) [](https://localhost:8080/#) in () 87 logging_callback = LoggingCallback(threshold=1e-5,patience=30,trial_number=5) 88 #You can increase the n_trials for a better search space scanning ---> 89 study.optimize(objective, n_trials=10,catch=(ValueError,),callbacks=[logging_callback]) 90 91 joblib.dump(study, "final_sac_study__.pkl") 6 frames [/usr/local/lib/python3.10/dist-packages/optuna/storages/_in_memory.py](https://localhost:8080/#) in get_best_trial(self, study_id) 232 233 if best_trial_id is None: --> 234 raise ValueError("No trials are completed yet.") 235 elif len(self._studies[study_id].directions) > 1: 236 raise RuntimeError( ValueError: No trials are completed yet.