MorvanZhou Reinforcement-learning-with-tensorflow issues

MorvanZhou / Reinforcement-learning-with-tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/

MIT License

8.91k stars 5.01k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

PPO convergence

#167 aliamiri1380 opened 4 years ago
0
PPO中如何处理不同长度的episode？

#166 YingxiaoKong opened 4 years ago
0
DPPO完全写错了，worker推送的是梯度而不是样本

#165 GIS-PuppetMaster closed 4 years ago
3
DDPG: Actor target network is a garbage. ---> sorry!! misunderstading

#164 hccho2 closed 4 years ago
0
Simple PPO.py

#163 GIS-PuppetMaster closed 4 years ago
1
使用DDPG探索范围很小

#162 YingxiaoKong opened 4 years ago
4
Prioritized_Replay_DQN not working

#161 alirezakazemipour opened 4 years ago
0
[revised] .ix to .loc AND [add] .astype() for Type error: reduction operation .argmax()

#160 ghost closed 4 years ago
0
如何限制输出的动作不小于0？

#159 YingxiaoKong closed 4 years ago
1
ddpg算法没有收敛.没有复现视频的结果

#158 zhangbo2008 closed 4 years ago
1
DDPG action 不需要normalize 吗？

#157 YingxiaoKong closed 4 years ago
1
关于第一章get_env_feedback 公平问题

#156 fengchaohao opened 4 years ago
0
Questions about the 5.2_Prioritized_Replay_DQN

#155 seey0u opened 4 years ago
0
关于actor多维连续动作值的概率密度构建

#154 sasforce opened 4 years ago
2
v_s_ = 0, when the last step is terminal.

#153 hyc6668378 closed 5 years ago
0
Sarsa 算法最后只在起点附近移动

#152 virdel closed 5 years ago
0
Inverted the meaning of epsilon in the Q-Learning algorithm

#151 douglasrizzo closed 5 years ago
1
Mac how to use tkinter

#150 ZhuYun97 closed 5 years ago
1
ValueError: invalid literal for int() with base 10: 'None' when run 'env.render()'

#149 xingyueye opened 5 years ago
0
PPO : Multiply Mu *2 ?

#148 lhorus opened 5 years ago
1
PPO and Reward

#147 yangtianyong opened 5 years ago
0
Added RND_PPO.py, RND with PPO (solves MountainCarContinuous-v0).

#146 ChuaCheowHuan closed 5 years ago
1
ModuleNotFoundError: No module named 'vnpy.api.ctp.vnctpmd'

#145 xinjiyuan97 closed 5 years ago
0
ddpg的tf.keras改写

#144 QiuChenFeng opened 5 years ago
1
代码下载下来后训练不收敛是什么问题呢

#143 niniuba123456 opened 5 years ago
0
bug_issue: A3C环境交互step() 后返回的done 被下面一行判断覆盖了.

#142 hyc6668378 opened 5 years ago
0
States in the Environment.

#141 Kalpan13 opened 5 years ago
1
simply_PPO中与环境交互时为什么不使用old_pi而是pi

#140 Qiyangcao opened 5 years ago
3
Dueling DQN里为什么target net 没有lock weight啊

#139 Qiyangcao closed 5 years ago
0
discrete_DPPO: 多线程 env.render() 显示

#138 wangyubin112 closed 5 years ago
0
How can solve the problem of action == Nan in PPO？

#137 niu0717 opened 5 years ago
2
fixed a 'runs slowly gradually' problem

#136 Gaoee closed 5 years ago
1
question about tf.GraphKeys

#135 WillysMa opened 5 years ago
0
请问morvan有如何写simulator也就是环境environment的教程吗？

#134 DaDaDoDoLee opened 5 years ago
1
这里的回报 r 具体指什么？如何根据自己的问题修改代码以获得回报r？

#133 liudading opened 5 years ago
1
如何画奖励与训练回合的关系图？

#132 Curry30h opened 5 years ago
1
Save and Reuse of DDPG model

#131 lyjge opened 5 years ago
1
How to print Actor and Critic Loss in DDPG update 2?

#130 ghost closed 5 years ago
0
simply_PPO中update_oldpi_op是否有错？

#129 janyChan closed 5 years ago
0
REINFORCE中对discounted reward的centralize的依据是什么？

#128 ZefanW opened 5 years ago
0
About Atari

#127 Precola opened 5 years ago
3
find a bug in DDPG.py

#126 jiangyuzhao closed 4 years ago
1
fix a bug in DDPG.py.

#125 jiangyuzhao closed 4 years ago
9
fix a bug in DDPG.py.

#124 jiangyuzhao closed 5 years ago
0
关于PPO具体使用

#123 janyChan opened 5 years ago
0
hi,你的simply_ppo代码中的几个错误：

#122 clicdl closed 5 years ago
1
a3c的疑问

#121 icesit opened 5 years ago
3
Why is ReUse for? DDPG

#120 ghost closed 5 years ago
0
game

#119 LexieeWei closed 5 years ago
0
Save the model

#118 afcentry closed 5 years ago
1

Previous Next