policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/deep-rl-class #556

correct this typo

Hi In the text of **The advantages and disadvantages of policy-gradient methods** , you wanted to write `policy-based methods` but as a typo, you wrote `value-based methods` : > Advantages > Ther…

be42day updated 2 months ago
1
google/flax #4391

Issue with Optimizer Update in A2C Network with Optax Body:

Hello everyone, I've encountered a problem while implementing an A2C (Advantage Actor-Critic) network involving Flax and Optax. My network includes _policy_network_ and _value_network_, each containi…

Tomato-toast updated 1 day ago
4
huggingface/accelerate #3239

OOM error when training llama 7B model using Accelerate FSDP…

### System Info ```Shell - `Accelerate` version: 0.31.0 - Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.35 - `accelerate` bash location: - Python version: 3.10.12 - Numpy version: 1.2…

JlPang863 updated 1 week ago
1
CR-Gjx/LeakGAN #2

Improve LeakGAN by Changing Policy Gradient Structure

Hey @CR-Gjx Thanks for providing this open source code. Very helpful to study and I love the idea of hierarchical reinforcement learning. In the recent AlphaGo Zero paper and [Thinking Fast and Slo…

NickShahML updated 6 years ago
5
jurgisp/pydreamer #12

require_grad is False when actor_grad is dynamic

When running the code on DMC, because the `actor_grad` is `dynamics`; therefore, `loss_policy` would be `-value_target`. `value_target` is not dependent on the actor's policy distribution, and so, `lo…

mrsamsami updated 4 months ago
5
keras-rl/keras-rl #278

Policy gradient agent - no intention to add?

I've noticed that the REINFORCE algorithm (aka policy gradient, without the Q function) is not listed in the list of agents, even the not-yet-implemented ones. I presume this was intentional? How co…

jonilaserson updated 2 years ago
7
kgex/developer-roadmap #493

Add Deep Deterministic Policy Gradients (DDPG) resource

DineshkumarS05 updated 1 year ago
3
yukezhu/tensorflow-reinforce #10

Compute policy gradient using cross entropy loss

This is not an issue of the code per se, but I am learning RL and am wondering how the policy gradient is calculated in `pg_reinforce.py`. In this line ``` self.cross_entropy_loss = tf.nn.sparse_so…

weiliu620 updated 5 years ago
2
tensorflow/tfjs #8379

`AdamaxOptimizer#applyGradients` fails when the gradient's o…

Please make sure that this is a bug. As per our [GitHub Policy](https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests a…

benoitkoenig updated 1 month ago
5
rl-tokyo/survey #5

PGQ: Combining policy gradient and Q-learning

https://arxiv.org/abs/1611.01626

sotetsuk updated 7 years ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient