Implement Monte-Carlo method and pipeline - Githubissues

AIDynamicAction / rcognita

rcognita is a flexibly configurable framework for agent-enviroment simulation with a menu of predictive and safe reinforcement learning controllers

MIT License

16 stars 7 forks source link

Implement Monte-Carlo method and pipeline #59

Open osinenkop opened 2 years ago

osinenkop commented 2 years ago

Need:

System: pendulum
Scenario for Monte-Carlo learning
REINFORCE

Visualizer: as always (like 3wrobot), but upper left screen: pendulum and its trajectory (dotted line like 3wrobot)

Monte-Carlo scenario:

loop over policy gradient updates
each such update needs several episodes (former runs), so loop over episodes
each episode is like the current main loop, i.e., it iterates over steps
when all episodes are done, experience is used to update policy parameters

Policy must be a PDF (probability distro func). Useful policy parametrizations -- see S&B, p. 322 book. REINFORCE algorithm can also be found there