codezonediitj / reinforce

A framework for reinforcement learning
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Representation of MDPs #1

Open czgdp1807 opened 4 years ago

czgdp1807 commented 4 years ago

Description of the problem

Since, RL aims to solve MDPs i.e., Markov Decision Processes so our first aim should be decide on their representation. It should be designed in such a way that RL algorithms can easily use these representations for finding optimal/sub-optimal solutions. MDPs have the following elements,

  1. State
  2. Actions
  3. Transition Probabilities
  4. Transition Rewards
  5. Policy
  6. Performance Metric

SMDPs have an additional element called Time of Transition.

How each of the above elements can be represented? One idea can be to use a class for encapsulating the above elements.

Example of the problem

References/Other comments

czgdp1807 commented 4 years ago

MDPs and its associated concepts may be represented using the following class structure,

class Action
{
    private string description;

   private Action(const string& description="");
   public static getObject(const string& description="");
   private ~Action();
   public void deleteObject();
   public string getDescription();
};

template <class _type>
class State
{
    private string description;
    private vector<Action*> actions;
    unordered_map<Action*, _type> transitionProbs;
    unordered_map<Action*, _type> iTransitionRewards;

   private State(const string& description="");
   public static getObject(const string& description="");
   public void addAction(Action& action);
   public void setTransitionProb(Action& action, _type transitionProb);
   public void setITransitionRewards(Action& action, _type reward);
};

template <class _type>
class MarkovDecisionProcess
{
    private vector<State<_type>*> stateSpace;
    private unordered_map<State<_type>*, Action*>  policy;
    friend _type performanceMetric();

    private MarkovDecisionProcess();
    public static getObject();
    public void addState(State& state);
    public void updatePolicy(State& state, Action& action);
};