-
Just a feature request ! :)
Could you implement a crossover function in order to perform reinforcement learning ? (genetic algorithm for me :) )
and obviously with crossover we need a simple mutat…
-
Not sure if this is already a feature or not, please forgive and provide insight :)
While I haven't tried yet, I understand that tune has support for search algorithms (like BO, spearmint, etc.), w…
-
## Problem with Signal
Signal has ***copious*** privacy issues making it unfit for privacytools.io endorsement.
1. Users are forced to supply a phone number to Signal (https://github.com/privacy…
ghost updated
3 years ago
-
I find the code super interesting and left some random comments on API naming. Take them as a grain of salt! I might start contributing at some point.
rxwei updated
4 years ago
-
#### Description
Hi, I tried training a model, with
```
from gensim.models import Doc2Vec
model = Doc2Vec(min_count=1, window=10, size=100, sample=1e-4, negative=5, workers=7)
model.…
-
Hi
I do not know if you might consider this as a question that I can ask you. I have been working with a PPO agent code that seemed to train for the environment (custom) that I have. However, in or…
-
Hi I am getting the error below while running the code:
```
Traceback (most recent call last):
File "tf14_runner.py", line 144, in
runner.run(args)
File "tf14_runner.py", line 114, i…
-
Deep Deterministic Policy Gradients ([DDPG][1]) and stable Baseline Code is presented [here][2].
The actor-critic networks are created as follows:
normalized_obs = tf.clip_by_value(normali…
-
# Reinforcement Learning
Study List
-[] Brief of Reinforcement Learning
-[] Methods
-[] The reason to use
-[] Preparation
-[] Qlearning
-[] Qlearning algorithm
-[] Qlearning strategy
-[…
-
在论文中给出的算法伪码,我理解的是,不论是task encoder的更新还是策略的更新都是直接对公式(10)进行SGD,这是一个meta-update,应该可以简单地理解成一个梯度更新。但是在源码中,我看到了ppo和a3c以及它们的loss,请问tesp需要借助ppo或者a3c的策略更新方式么?