policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

axolotl-ai-cloud/axolotl #1888

Training with a large json dataset (>650K) throw error:pyarr…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ###…

bofei5675 updated 1 week ago
1
dennybritz/reinforcement-learning #54

Gradient clipping in A3C

Hi there, I noticed that even though policy net and value net share some parameters (in a3c/estimators.py), their gradient were [clipped](https://github.com/dennybritz/reinforcement-learning/blob/m…

poweic updated 7 years ago
2
DLR-RM/rl-baselines3-zoo #472

Why is my SB3 DQN agent unable to learn CartPole-v1 despite …

### 📚 Documentation I obtained optimal hyperparameters for training CartPole-v1 from [RLZoo3][1]. I have created a minimal example demonstrating the performance of my CartPole agent. As oer the off…

Deepakgthomas updated 1 month ago
2
ray-project/ray #28283

[RLlib] nan value error in compute gradient policy with PPO&…

### What happened + What you expected to happen I assume the error is related to the action space being large because I can not reproduce it when the action space is much smaller (ie 10 times fewer…

AJSVB updated 4 months ago
7
stheid/constraint_water_treatment_gym #1

Extend stablebaseline 3 PPO implementation by with penalties

In each policy update step, the penalty function is called with the policy and the current state. this results in a gradient. Currently i have two ideas for such a penalty function, both need a (co…

stheid updated 4 years ago
2
openai/spinningup #211

Order of loss computation and gradient step seems wrong in v…

``` def update(): data = buf.get() # Get loss and info values before update pi_l_old, pi_info_old = compute_loss_pi(data) pi_l_old = pi_l_old.item() …

gauravjain14 updated 4 years ago
3
osinenkop/regelum-control #3

AttributeError: type object 'RegelumBase' has no attribute '…

Got this error when initially ran the policy_gradient. example script. ![metadata_issue](https://github.com/osinenkop/regelum-control/assets/161316313/4a899de9-a9e7-4baa-9969-7ad3c7f0b211)

a1mrz updated 7 months ago
2
eric-mitchell/direct-preference-optimization #56

DPO did not achieve the expected experimental effect

I replicated the experiments of pythia28 on hh (Anthropic/hh-rlhf) using the open-source code. Here are some of the experimental results: **SFT1**: ~~~ python -u train.py exp_name=sft gradient_ac…

Vance0124 updated 9 months ago
2
fhswf/MLPro #440

RL: Training Asynchronously

The rough idea is the following: - Share policy network. - Collect experience asynchronously. - Accumulate gradient. - Update policy network. **Related issues:** #391 #438

rizkydiprasetya updated 9 months ago
1
keavil/AAAI18-code #15

Is Actor-critic used here?

I am confused by your code. In the paper, it is mentioned that a policy gradient method [1] is used. But more specifically, I think that is implemented by Actor-Critic. If I am wrong, plz tell m…

RizhaoCai updated 5 years ago
1

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient