actor-critic Search Results

1000+ results
for actor-critic

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

SamsungLabs/tqc_pytorch #8

Incompatible with PyTorch 2.0 - variable modified inplace

Getting the following error when trying to run the code with a (very simple) custom env using PyTorch 2.0.1: `RuntimeError: one of the variables needed for gradient computation has been modified by…

maxweissenbacher updated 6 months ago
1
csingh27sewts/Masterarbeit #72

Check why actor loss, critic loss, value loss are increasing

csingh27 updated 3 years ago
2
PacktPublishing/PyTorch-1.x-Reinforcement-Learning-Cookbook #1

actor_critic_mountaincar.py in chapter 8 doesn't work

Trying to run actor_critic_mountaincar.py in chapter 8, this doesn't seem to work. I get results that don't seem to converge. Typically, the total reward will start at a negative number and hover ab…

mbreaux updated 3 years ago
2
MishaLaskin/curl #20

Optimising encoder twice during CURL?

Thanks for sharing your code, it's great to be able to go through the implementation. Maybe I'm misunderstanding this, but it seem that if you intend `self.cpc_optimizer` to only optimise W, then …

wassname updated 6 months ago
9
araffin/sbx #40

[Feature Request] Recurrent policies

There are recurrent (LSTM) policy options for sb3 (e.g. [RecurrentPPO](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py)). It w…

jamesheald updated 1 week ago
12
rl-2023/rl-2023-final-project #2

Agents don't learn

Even after a bigger run, agents don't learn: according to the pressurplate we have a reward in [-0.9,0] if the agent is in the same room of the assigned plate and reward [-1,...,-N] otherwise. I tri…

MicheleMusacchio updated 2 months ago
2
dennybritz/reinforcement-learning #116

What's the difference between baseline solution and Actor-Cr…

I think td_error in AC is same with advantage in baseline solution, which are all reward minus predicted value. One difference is AC value network is learning in TD, baseline solution is learning d…

droiter updated 3 years ago
5
openai/baselines #341

Reason for denormalizing critic value in DDPG

I have seen the following lines in `ddpg.py`: ```python self.critic_tf = denormalize(tf.clip_by_value(self.normalized_critic_tf, self.return_range[0], self.return_range[1]), self.ret_rms) self.norm…

Kipsora updated 5 years ago
1
pytorch/pytorch #18850

Allow tracing of models which output `None`

## 🚀 Feature Allow tracing of models which output tuples with some attributes equal to `None` ## Motivation In big complex models, the forward pass should be able to output different information,…

nikonikolov updated 4 years ago
4
pytorch/examples #151

A3C instead of actor-critic in reinforcement_learning/reinf…

There is the code of reinforce.py `for action, r in zip(self.saved_actions, rewards): action.reinforce(r)` And there is the code of actor-critic.py: ` for (action, value), r in zi…

susht3 updated 2 years ago
1

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for actor-critic

1000+ results
for actor-critic