-
Hi, I tried to run your code.
I ran the train-qgen-reinforce.py (with pre-trained model provided).
The initial score is similar to your accuracy provided README.
I have a question. Is the pre-trai…
-
Hi, I want to implement the DDPG algorithm, and before that, i've read your code. It's very useful. And I still have some little questions about the code.
1.As in the DDPG.py, line 61 to 66:
…
-
1) The training law of the Actor network (Eq.2) uses the gradient of the network times the reward difference A, to evolve the model of the neural network
Is the term "detla_theda log pi_theda (s_t,…
-
Hi VinF,
your library is very helpful. Thank you!
[Weight normalization](https://arxiv.org/abs/1602.07868) might be a way to make SGD-based algorithms suitable for a wider range of environments …
-
First off, terrific work on repo and blog post, very detailed and clear.
I was able to solve the BipedalWalkerHardcore-v2, average 300+ for 100eps, with rl with an a3c implentation I made but it t…
-
I can not understand where the data about each start is saved
Tell me please
-
to foster community involvement - some richer sample code beyond MNIST should be tackled.
Generative Adversarial Networks is a hot topic amongst ML - and some sample code using swift should help enco…
-
Dear TA,
Here is the pseudo code in reference [1]
![2017-12-21_151522](https://user-images.githubusercontent.com/32902010/34244772-f1609d7a-e661-11e7-8128-0c20f2e13c06.jpg)
yj is the estimate…
-
Hi, Hongzi
If you don't mind, may I ask you a few more questions?
Firstly, I have two general questions:
1) Is the data flow of critic network completely separate from the one of the actor ne…
-
Hello,
I'm now facing another problem, I wanted to register several models to find many information, here is the problem:
```golang
package main
import (
"fmt"
"time"
"github.com/shixz…