AIDynamicAction / rcognita

rcognita is a flexibly configurable framework for agent-enviroment simulation with a menu of predictive and safe reinforcement learning controllers
MIT License
16 stars 7 forks source link

Implement Monte-Carlo method and pipeline #59

Open osinenkop opened 2 years ago

osinenkop commented 2 years ago

Need:

  1. System: pendulum
  2. Scenario for Monte-Carlo learning
  3. REINFORCE

Visualizer: as always (like 3wrobot), but upper left screen: pendulum and its trajectory (dotted line like 3wrobot)

Monte-Carlo scenario:

  1. loop over policy gradient updates
  2. each such update needs several episodes (former runs), so loop over episodes
  3. each episode is like the current main loop, i.e., it iterates over steps
  4. when all episodes are done, experience is used to update policy parameters

Policy must be a PDF (probability distro func). Useful policy parametrizations -- see S&B, p. 322 book. REINFORCE algorithm can also be found there