-
The following line returns zero and throws a `ValueError` if the batch size resulting from the number of processes and steps is smaller than the number of mini batches. This happens to me especially i…
-
First, great work!!
I've decided to "upgrade" and use your A2C implementation instead of your A3C's, but I was surprised to see in your code that the changes aren't minor as I thought they would be. …
-
Although the current implementation of the policy takes a `deterministic` argument it is never applied and all policy sample random actions even for testing.
https://github.com/ikostrikov/pytorch-a2c…
-
I get the following error:
```
Traceback (most recent call last):
File "acktr-agent.py", line 10, in
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
File "/Users/cli…
-
It seems like baselines is not directly implemented to deal with `Box()` type action spaces. This same exact code works for the `CartPole` environment. It fails on `FetchReach-v1`. Here is the code…
-
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/blob/17ea8333ecbfe6552470f50fab4f83e1444f43a6/main.py#L226
-
It seems the PyTorch Mnist hogwild example has been updated now, as gradients are now allocated lazily.
I think this means that this part of you code is no longer required?
-
I have two questions regarding the implementation of recurrent policies:
1. Why do you have a loop recomputing states in your recurrent policy. It seems you could use the states you already stored …
-
```
I00000009 0x71a94ad414ff12c2 rcv loss_detection_alarm=381524903823256 last_hs_tx_pkt_ts=381524893823256 alarm_duration=10
I00000009 0x71a94ad414ff12c2 frm 2992096611 tx S01(0x1f) STREAM(0x16) id…
scw00 updated
6 years ago
-
like eg, imagine I have my own policy, that takes in a state, and outputs an action, or perhaps a distribution over actions; and I have a world that takes an action, and returns a reward and a new sta…