-
Getting the following error when trying to run the code with a (very simple) custom env using PyTorch 2.0.1:
`RuntimeError: one of the variables needed for gradient computation has been modified by…
-
-
Trying to run actor_critic_mountaincar.py in chapter 8, this doesn't seem to work. I get results that don't seem to converge. Typically, the total reward will start at a negative number and hover ab…
-
Thanks for sharing your code, it's great to be able to go through the implementation.
Maybe I'm misunderstanding this, but it seem that if you intend `self.cpc_optimizer` to only optimise W, then
…
-
There are recurrent (LSTM) policy options for sb3 (e.g. [RecurrentPPO](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py)). It w…
-
Even after a bigger run, agents don't learn:
according to the pressurplate we have a reward in [-0.9,0] if the agent is in the same room of the assigned plate and reward [-1,...,-N] otherwise.
I tri…
-
I think td_error in AC is same with advantage in baseline solution, which are all reward minus predicted value.
One difference is AC value network is learning in TD, baseline solution is learning d…
-
I have seen the following lines in `ddpg.py`:
```python
self.critic_tf = denormalize(tf.clip_by_value(self.normalized_critic_tf, self.return_range[0], self.return_range[1]), self.ret_rms)
self.norm…
-
## 🚀 Feature
Allow tracing of models which output tuples with some attributes equal to `None`
## Motivation
In big complex models, the forward pass should be able to output different information,…
-
There is the code of reinforce.py
`for action, r in zip(self.saved_actions, rewards):
action.reinforce(r)`
And there is the code of actor-critic.py:
` for (action, value), r in zi…