VinF / deer

DEEp Reinforcement learning framework
Other
485 stars 126 forks source link

Naming convention - Policy #40

Closed kkshyu closed 8 years ago

kkshyu commented 8 years ago

Environment and Policy both contain act method, but they do quite different things. In my opinion, act is a verb to perform sth. Therefore, in Policy abstract class, it should be a noun action just like bestAction. However, chooseAction and chooseBestAction are good, too.

class Policy(object):
    """Abstract class for all policies, i.e. objects that can take any space as input, and output an action.
    """

    def __init__(self, q_network, n_actions,random_state):
        self.q_network = q_network
        self.n_actions = n_actions
        self.random_state = random_state

        pass

    def bestAction(self, state):
        """ Returns the best Action
        """
        action = self.q_network.chooseBestAction(state)
        V = max(self.q_network.qValues(state))
        return action, V

    def act(self, state):
        """Main method of the Policy class. It can be called by agent.py, given a state,
        and should return a valid action w.r.t. the environment given to the constructor.
        """
        raise NotImplementedError()
VinF commented 8 years ago

I agree that chooseAction may be more appropriate so as to avoid the confusion with the method in Environment(however I don't think chooseBestAction would be appropriate because the policies have to deal with the exploration/exploitation dilemma which is done within that method).

kkshyu commented 8 years ago

Did you mean chooseBestAtction have already dealt with exploration/exploitation dilemma? I thought the dilemma should be implemented in act(chooseAction). In my opinion, the method bestAction in policies is similar with chooseBestAction in q-network. Actually I don't understand the purpose of this method.

VinF commented 8 years ago

Ok, I think I misunderstood your first message. Either we have act--->chooseAction and bestAction--->chooseBestAction or we keep bestAction and we change act--->action. (of course, chooseBestAction or bestAction has already dealt with exploration/exploitation) And indeed bestAction in policies returns currently directly chooseBestAction in q-network. It's an additional encapsulation but if it's not required in the future, the best is probably to remove it, indeed. Thanks for the feedback! Either you can do a PR or I'll do the changes.

kkshyu commented 8 years ago

41