advantage-actor-critic Search Results

298 results
for advantage-actor-critic

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

hongzimao/pensieve #104

Is multi_agent an actual A3C implementation ?

Hi, I know this is more of a technicality, but I would like to clarify this. A3C stands for Asynchronous Advantage Actor-Critic, whereas A2C can also be considered as a synchronous version [Ope…

mkanakis updated 4 years ago
2
cosmicBboy/ml-research #19

[metalearn] implement advantage actor critic algorithm

experiments so far have demonstrated that REINFORCE policy gradient by itself is not enough to implement metalearning. Based on work in https://arxiv.org/abs/1611.05763, it seems like both a policy an…

cosmicBboy updated 4 years ago
1
ray-project/ray #8352

[rllib] Migration from stable_baselines to rllib

*Ray version and other system information (Python version, TensorFlow version, OS):* - Ray: [0.8.5 (02c1ab0ec6d615ad54ebf33bd93c51c04000534e)](https://s3-us-west-2.amazonaws.com/ray-wheels/releas…

duburcqa updated 3 years ago
2
WorldModelers/Ontologies #109

Plausible new concepts mined from subset of P3 documents

As mentioned in the WM reading telecons, we did clustering over a subset of the P3 documents that MITRE/TwoSix recently shared. @jmacbrid from BBN, who recently joined our Hume team, went over those c…

chanys updated 3 years ago
1
starry-sky6688/MARL-Algorithms #25

coma td_error 中target q的计算问题

Hi, 老兄，又来请教一番，哈哈。我注意到在coma中q_next_target的计算，有点不太理解，为啥这么计算。 282 q_evals = torch.gather(q_evals, dim=3, index=u).squeeze(3) 283 q_next_target = torch.gather(q_next_target, dim=3, ind…

yywe updated 4 years ago
5
moabitcoin/cherry-pytorch #43

Advantage Actor Critic

Implement A2C for Atari / Doom games. Inspiration [here](https://www.coursera.org/lecture/practical-rl/advantage-actor-critic-dya16) and [here](https://www.freecodecamp.org/news/an-intro-to-advantage-…

sandhawalia updated 4 years ago
1
shariqiqbal2810/MAAC #19

does training advantage soft actor critic based on replay bu…

Dear Author, I take a fast look at your code on actor updates. It seems that you have use advantage soft actor critic, i.e., Advantage: `pol_target = q - v` loss: `pol_loss = (log_pi * (log_pi /…

KK666-AI updated 4 years ago
7
astooke/rlpyt #85

How can we train LSTM PPO?

Hi, I am trying to train LSTM PPO on Hopper-v3 but it did not learn well. Although LSTM policy is hard to learn compared to FF policy, it seems there are several missing pieces to train them. Cou…

jd730 updated 4 years ago
10
ray-project/ray #9305

[rllib] placeholders added in postprocess_fn are missing whe…

### What is the problem? I'm working with SAC and want to add a new field to `train_batch` in the same way as it is done in the [PPO advantage example](https://docs.ray.io/en/latest/rllib-concept…

bentzinir updated 4 years ago
5
duckietown/challenge-aido_LF-baseline-RL-sim-pytorch #32

Architecture of Actor-Critic DDPG

Hi, I have a question about the particular implementation of duckietown Critic (Actor-Critic DDPG). Could you please explain the intuition behind the idea why Critic gets **action input** only a…

rizavelioglu updated 4 years ago
1

上一页 1...21 22 23 24 25 26 27...30 下一页

298 results for advantage-actor-critic

298 results
for advantage-actor-critic