-
### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-143-generic x86_64)
- **Ray installed from (source or binary)**: pip inst…
-
### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: macOS Mojave
- **Ray installed from (source or binary)**: binary
- **Ray version**: 0.8.0.dev1
- **Python ve…
-
First of all thanks a lot for this awesome project. Stable-Baselines helps me a lot!
I try to get a GAIL Agent going to get experience with Inverse Reinforcement Learning in combination of Gym and/…
-
In stable-baselines/ddpg/ddpg.py line 916 and 918 should the eval/return and eval/Q be np.mean to make scalar?
`# Evaluation statistics.`
`if self.eval_env is not None:`
` co…
-
Is there any reason why we are saving the entire model, rather than its state_dict()? Also, why do we create a CPU copy of the CUDA actor-critic network before saving it (line: 152, copy.deepcopy(acto…
-
-
i am testing my AC code using Pendulum-v0 environment and got this error:
gym\envs\classic_control\pendulum.py:88: RuntimeWarning: invalid value encountered in remainder
return (((x+np.pi) % (2…
-
std_share_network=True,
hidden_sizes=(200,200)
observed behavior: std explodes
-
I want to know the significance of **squeeze** operation (line: 162) in ```a2c_ppo_acktr/envs.py```. The *squeeze* operation sends scalar values as **action_value** instead of singly-sized vectors for…
-
I would like to know why is getattr(get_vec_normalize(envs), 'ob_rms', None) saved along with actor-critc network?
*line: 154-155, main.py*