-
Our improved MAL solutions currently take lots of samples but implementing model-learning as RL with a complex NN describing the agents "policy" might not be necessary, when essentially all we want to…
-
First, hands down, amazing work. Serving as a baseline, I see a possible improvement, if someone wants to implement it:
- The n-step return, as it is, is biased (as you are using old off-policy sam…
-
Feedback:
Modelling:
- Do not need too much epochs (Use Wandb to simultaneously visit your performance. Rule of thumb, around 50 is enough)
- Validate after each epoch
- Consider using t11-t16 samples…
-
Hi,
I was running a spatial dataset of size 2,419 sampling units, 17 covariates (8 continuous covariates with 2nd order enabled plus 1 intercept), 3 species, and 1 spatial random level. I found tha…
-
### 🚀 Feature
Can a recent BBF algorithm be added to the library- https://github.com/google-research/google-research/tree/master/bigger_better_faster?
### Motivation
It is model free single agent R…
-
For reference, we will collect a list of discussed papers as well as the date of discussion in this issue.
leezu updated
7 years ago
-
Is there a way to save weights to a file and reload them later? For instance, in the car example there is ui.cpp which lets the user control the car, and car.cpp appears to train it. I am assuming may…
-
link to in contribution guidelines, what is in scope etc
-
> Der nächste Schritt wäre einen Agenten mit zwei Optimierungsalgorithmen zu trainieren. Hierfür könnten Sie im Reinforcement Learning-Bereich den PPO und DQN Algorithmus verwenden. Sie könnten aber a…
-
Policy Search
- [ ] [PI2](http://proceedings.mlr.press/v9/theodorou10a/theodorou10a.pdf), is already implemented #28
- [ ] [PoWER](http://www.ias.informatik.tu-darmstadt.de/publications/peters_ADPR…