Hello - While the above tutorial works well for DDPG, I am unable to recreate the same for SAC. While the tutorial works well for 50000 timesteps for DDPG, the same starts failing when the number of timesteps is more than 100 for SAC. When recreating for SAC with anything more than 100 timesteps, I get the below error:
[I 2024-06-01 11:43:00,899] A new study created in memory with name: sac_study
:4: FutureWarning: suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.
learning_rate = trial.suggest_loguniform("learning_rate", 1e-1, 1)
{'buffer_size': 100000, 'learning_rate': 0.3968793330444371, 'batch_size': 256}
Using cpu device
[W 2024-06-01 11:43:06,313] Trial 0 failed with parameters: {'buffer_size': 100000, 'learning_rate': 0.3968793330444371, 'batch_size': 256} because of the following error: ValueError('Expected parameter loc (Tensor of shape (1, 30)) of distribution Normal(loc: torch.Size([1, 30]), scale: torch.Size([1, 30])) to satisfy the constraint Real(), but found invalid values:\ntensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan]])').
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
value_or_values = func(trial)
File "", line 63, in objective
trained_sac = agent.train_model(model=model_sac,
File "/usr/local/lib/python3.10/dist-packages/finrl/agents/stablebaselines3/models.py", line 117, in train_model
model = model.learn(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/sac.py", line 307, in learn
return super().learn(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 328, in learn
rollout = self.collect_rollouts(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 557, in collect_rollouts
actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 390, in _sample_action
unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/base_class.py", line 556, in predict
return self.policy.predict(observation, state, episode_start, deterministic)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/policies.py", line 368, in predict
actions = self._predict(obs_tensor, deterministic=deterministic)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/policies.py", line 353, in _predict
return self.actor(observation, deterministic)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/sac/policies.py", line 170, in forward
return self.action_dist.actions_from_params(mean_actions, log_std, deterministic=deterministic, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 190, in actions_from_params
self.proba_distribution(mean_actions, log_std)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 224, in proba_distribution
super().proba_distribution(mean_actions, log_std)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/distributions.py", line 164, in proba_distribution
self.distribution = Normal(mean_actions, action_std)
File "/usr/local/lib/python3.10/dist-packages/torch/distributions/normal.py", line 56, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributions/distribution.py", line 68, in __init__
raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1, 30)) of distribution Normal(loc: torch.Size([1, 30]), scale: torch.Size([1, 30])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan]])
[W 2024-06-01 11:43:06,315] Trial 0 failed with value None.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[](https://localhost:8080/#) in ()
87 logging_callback = LoggingCallback(threshold=1e-5,patience=30,trial_number=5)
88 #You can increase the n_trials for a better search space scanning
---> 89 study.optimize(objective, n_trials=10,catch=(ValueError,),callbacks=[logging_callback])
90
91 joblib.dump(study, "final_sac_study__.pkl")
6 frames
[/usr/local/lib/python3.10/dist-packages/optuna/storages/_in_memory.py](https://localhost:8080/#) in get_best_trial(self, study_id)
232
233 if best_trial_id is None:
--> 234 raise ValueError("No trials are completed yet.")
235 elif len(self._studies[study_id].directions) > 1:
236 raise RuntimeError(
ValueError: No trials are completed yet.
https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/4-Optimization/FinRL_HyperparameterTuning_using_Optuna_basic.ipynb
Hello - While the above tutorial works well for DDPG, I am unable to recreate the same for SAC. While the tutorial works well for 50000 timesteps for DDPG, the same starts failing when the number of timesteps is more than 100 for SAC. When recreating for SAC with anything more than 100 timesteps, I get the below error:
[I 2024-06-01 11:43:00,899] A new study created in memory with name: sac_study