Closed diybl closed 1 year ago
@ChengYen-Tang WIP: PPO
If possible @diybl please support and coordinate with @ChengYen-Tang
@diybl @ChengYen-Tang Deel RL using TorchSharp for Deep Learning
FYI: @NiklasGustafsson
FYI: @NiklasGustafsson
https://github.com/xin-pu/DeepSharp/discussions/10
mindmap
root((Reinforcement<br/>Learning))
Definitions
Interactions
Environment
Agent
Elements
State
Action
Strategy<br/>策略
Deterministic Policy<br/>确定性策略
Stochastic Policy<br/>随机性策略
State transfer probability<br/>状态转移概率
Rewards<br/>即时奖励
Others
Episodes
Trial
Continuing Tasks
Policy
Policy based learning
Value based learning
Monte Carlo learning
Temporal Difference Learning
SARSA<br/>State Action Reward State Action
QLearning
Dynamic programming learning
Policy iteration algorithm
Policy Evaluation
Policy Improvement
Value iteration algorithm
Markov Decision Process
Markov Decision Process<br/>马尔科夫决策过程
Trajectory<br/>轨迹
Markov Process<br/>马尔科夫过程
Objective Functions
@diybl @ChengYen-Tang @NiklasGustafsson <= dotnet team @xin-pu
This is done => Now with WinForm UI and Winform Chat to monitor training progress RL_Matrix => nuget Gym.NET using TorchSharp
@asieradzk
https://github.com/asieradzk/RL_Matrix/issues/1#issuecomment-1691274872
Next step is to bring it to Godot
Yup. I'm hot on brining RL Matrix to Godot, particularly to flex on Unity's ML Agents. Ideally I'd like to have at least: DDPG, Rainbow, Deep-MCTS (AlphaZero) on top of DQN and PPO
Please give me architecture advice on how you'd like environment/agent creation implemented, I am basing my current version on how matlab does it to allow easy swapping of agents/environments.
@asieradzk
Here, we only work on MIT FOSS license
Thank you, great to hear. I will see about license change once I am done with Godot RL agents. Hopefully I find some time in the next weeks in-between working on my PhD.
@NiklasGustafsson
cc this issue
Related issue
@NiklasGustafsson
There is now PPO example using TorchSharp
@diybl -- can this be closed? I'm eager to close out old issues to get a proper sense of the size of our backlog.
is there ppo example by torchsharp?