-
### ❓ Question
I have been using the https://github.com/HumanCompatibleAI/imitation/ library for imitation learning for sb3 PPO with great effect. However, my end goal is to do the same for Recurrent…
-
# Per-Parameter-Sharding FSDP
## Motivation
As we looked toward next-generation training, we found limitations in our existing FSDP, mainly from the _flat parameter_ construct. To address these, w…
-
The cash bias in the network output (omega) always appears to be zero, even under conditions where it seems holding some cash would be better (i.e., bear markets, or when all traded markets are perfor…
-
There is an issue at open-ai baselines ( [here](https://github.com/openai/baselines/issues/121) ) about the advantages of a beta distribution over a diagonal gaussian distribution + clipping.
The re…
-
Sorry, terminal says
File "pong_policy_gradients.py", line 24, in
grad_buffer = { k : np.zeros_like(v) for k,v in model.iteritems() } # update buffers that add up gradients over a batch
is th…
-
### Question
I'm trying to implement sampling and training asynchronously using the SAC algorithm. I made the attempt shown in the code below. But I always get an error because there seems to be a …
-
I am trying to understand the heuristic algorithm used in the `memory` policy. However I could not fully understand the whole logic, especially the following `if statement` as shown below.
https:/…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
`prob = aprob / np.sum(aprob)`
https://github.com/keon/policy-gradient/blob/master/pg.py#L46
I am not sure if this line is really required, as they would be already normalized due to softmax. Plea…
-
Hi,
in "OpenAI Spinning Up" (https://spinningup.openai.com/en/latest/algorithms/ppo.html), is stated in a note about clipping:
> While this kind of clipping goes a long way towards ensuring reason…