policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Stable-Baselines-Team/stable-baselines3-contrib #248

[Question] How to do pre-training on the RecurrentPPO MlpLst…

### ❓ Question I have been using the https://github.com/HumanCompatibleAI/imitation/ library for imitation learning for sb3 PPO with great effect. However, my end goal is to do the same for Recurrent…

iwishiwasaneagle updated 1 month ago
1
pytorch/pytorch #114299

[RFC] Per-Parameter-Sharding FSDP

# Per-Parameter-Sharding FSDP ## Motivation As we looked toward next-generation training, we found limitations in our existing FSDP, mainly from the _flat parameter_ construct. To address these, w…

awgu updated 6 days ago
74
ZhengyaoJiang/PGPortfolio #55

cash bias is zero (again)

The cash bias in the network output (omega) always appears to be zero, even under conditions where it seems holding some cash would be better (i.e., bear markets, or when all traded markets are perfor…

dlacombejr updated 6 years ago
38
hill-a/stable-baselines #112

Beta distribution as policy for environments with bounded co…

There is an issue at open-ai baselines ( [here](https://github.com/openai/baselines/issues/121) ) about the advantages of a beta distribution over a diagonal gaussian distribution + clipping. The re…

skervim updated 4 years ago
22
llSourcell/policy_gradients_pong #2

AttributeError: 'dict' object has no attribute 'iteritems'

Sorry, terminal says File "pong_policy_gradients.py", line 24, in grad_buffer = { k : np.zeros_like(v) for k,v in model.iteritems() } # update buffers that add up gradients over a batch is th…

JieMEI1994 updated 5 years ago
4
DLR-RM/stable-baselines3 #715

Implement sampling and training asynchronously using the SAC…

### Question I'm trying to implement sampling and training asynchronously using the SAC algorithm. I made the attempt shown in the code below. But I always get an error because there seems to be a …

gilzamir18 updated 2 years ago
6
cybertronai/gradient-checkpointing #46

Some high level descriptions on the MEMORY policy will be ve…

I am trying to understand the heuristic algorithm used in the `memory` policy. However I could not fully understand the whole logic, especially the following `if statement` as shown below. https:/…

wangli1426 updated 5 years ago
4
axolotl-ai-cloud/axolotl #1775

Training early stop without setting the early_stopping_patie…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

leoozy updated 3 months ago
1
keon/policy-gradient #3

Why normalize predicted probabilities?

`prob = aprob / np.sum(aprob)` https://github.com/keon/policy-gradient/blob/master/pg.py#L46 I am not sure if this line is really required, as they would be already normalized due to softmax. Plea…

abhigenie92 updated 7 years ago
1
openai/baselines #1074

PPO: no way to handle divergence?

Hi, in "OpenAI Spinning Up" (https://spinningup.openai.com/en/latest/algorithms/ppo.html), is stated in a note about clipping: > While this kind of clipping goes a long way towards ensuring reason…

AGPX updated 4 years ago
1

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient