AlphaZeroIncubator / AlphaZero

Our implementation of AlphaZero for simple games such as Tic-Tac-Toe and Connect4.
0 stars 0 forks source link

Skeleton for Game classes #12

Closed guidopetri closed 4 years ago

guidopetri commented 4 years ago

This PR provides a (tentative) skeleton for game classes.

@PhilipEkfeldt I changed the API that you were using in #9 a little bit. Could you look over this and comment? I thought e.g. separating the result from the board game over state was a better idea than having the function return a tuple, but I'm open to discussing this.

PhilipEkfeldt commented 4 years ago

I agree with the function separation. In the mcts I'm currently assuming that Game is static only (has no board state on its own) and is only used for rules logic (i.e. the "world model" for the algorithm) and then I'm storing states in the Node class. To avoid having to create a new game object for each node (since I need to keep track of the state for each node), could we have an additional GameRuleSet class or similar? E.g. something like:

class GameRuleSet():
    __init__(params):
    ## Setup specific settings for a game, e.g. width, height, winning condition, etc. Objects contain no state aside from game settings

    def get_initial_board() - Board:

    def get_legal_move(board) - Actions (tensor/array):

    def is_game_over(board) -> Bool:

    def make_move(board, move) -> board:

    def result(board) -> Int:

This could then be used as input to instantiate a Game instance as well.

guidopetri commented 4 years ago

The only function I particularly don't like in that is the make_move function. What do you use that for in the MCTS? I can take a look later and see if I can come up with a solution.

The main reason why I don't want that is because then we do have a game instance that could be played outside of an actual Game instance.

PhilipEkfeldt commented 4 years ago

The only function I particularly don't like in that is the make_move function. What do you use that for in the MCTS? I can take a look later and see if I can come up with a solution.

The main reason why I don't want that is because then we do have a game instance that could be played outside of an actual Game instance.

I use it to expand a leaf node. So when I reach a leaf, I get all legal actions for that leaf and create new child notes using make_move to get the new states. I also use it during self play to get the new state after sampling.

guidopetri commented 4 years ago

Could this maybe be incorporated in some sort of variation stuff in the Game class? This behavior would then (hopefully) be easily inherited to any subclasses. You'd then have the "mainline", and then can expand however many leaves you want from any point in a game for the MCTS.

PhilipEkfeldt commented 4 years ago

Could this maybe be incorporated in some sort of variation stuff in the Game class? This behavior would then (hopefully) be easily inherited to any subclasses. You'd then have the "mainline", and then can expand however many leaves you want from any point in a game for the MCTS.

If I understand you correctly, this is what the mcts code would look like then?

if not leaf_node.is_terminal:
    # Expand node
    game = Game(leaf_node.state, params)
    child_states, actions = game.expand()
    for i in range (len(child_states)):
        child_node = MCTS_Node(state=child_states[i], 
                               action = actions[i],              
                               prior = leaf_node._policy[i])
        leaf_node.add(child)

vs currently:

if not leaf_node.is_terminal:
    # Expand node
    legal_actions = game.get_legal_actions(leaf_node.state)
    leaf_node.expand(legal_actions)

def expand(self, legal_actions):
    for index in legal_actions.nonzero():
        action = torch.zeros(legal_actions.size())
        action[index] = 1
        prior = self._policy[index].item()
        child = MCTS_Node( state=Game.make_move(self._state, action), 
                           action=action, 
                           prior=prior )
        self.add_child(child)

Some of the expand logic will have to be added to the Game class in that case. While it's certainly possible to do it this way, I feel like it adds unnecessary objects, since every time I want to get a new board state I have to create an object. Since the MCTS doesn't play a game the whole way through at any point, I just feel it adds overhead in terms of objects and also adds code to the Game class which is only used for the MCTS anyway and doesn't have to be in the Game class.

guidopetri commented 4 years ago

No, I don't mean to create a new Game instance at all for the MCTS. It would still run in one single Game instance.

What I mean is something more along the lines of how python-chess does it with its GameNodes. It has an is_mainline method that determines whether that node is in the mainline of a game or not; and it has access to both its parent and whatever children (i.e. sub-variations that you would use in MCTS) that it may have. Specifically, the "parent", "variations", and "is_mainline" methods here.

That way you'd be able to explore whatever variations you might require for the MCTS while still maintaining only a single Game instance, and still not have a separate ruleset class that could theoretically play games by itself. Honestly, having a separate ruleset class looks to me like it would just beg for bugs - DRY principle and all ;)

PhilipEkfeldt commented 4 years ago

No, I don't mean to create a new Game instance at all for the MCTS. It would still run in one single Game instance.

What I mean is something more along the lines of how python-chess does it with its GameNodes. It has an is_mainline method that determines whether that node is in the mainline of a game or not; and it has access to both its parent and whatever children (i.e. sub-variations that you would use in MCTS) that it may have. Specifically, the "parent", "variations", and "is_mainline" methods here.

That way you'd be able to explore whatever variations you might require for the MCTS while still maintaining only a single Game instance, and still not have a separate ruleset class that could theoretically play games by itself. Honestly, having a separate ruleset class looks to me like it would just beg for bugs - DRY principle and all ;)

Ok, I see what you mean. Each iteration would then have to be a Game object, as it start from the top again each iteration. Would the idea then be that I use GameNode instead of MCTS_Node? I'm not sure that I could do that, since I cannot subclass GameNode (since I need the information from the game specific subclass right?) and I have specific logic for MCTS that I store on each node. Alternatively, I have to keep track of a tree in two places I think? If you have any better suggestion I'm open to it, I'm just kinda stuck in the way I was thinking about this. :p

In terms of repeating, I guess I don't see what would be repeated except for maybe method names? The logic would be stored in the GameRuleSet class, and the Game class would simply make use of those static methods. Only GameRuleSet would have to be subclassed and not Game if I'm thinking about this right.

guidopetri commented 4 years ago

I haven't really understood your MCTS code, but maybe we can talk through it on the call later this week?

GameRuleSet class

ah, I see what you mean now. I'm not sure this would work perfectly, I think I could probably come up with an example game that has some weird quirk where a strict Game class that inherits from GameRuleSet wouldn't work. But I'll think about this further - your idea makes a lot more sense to me now, thanks for explaining haha.

If that were the case, would you then use Game in the MCTS implementation? or GameRuleSet? Because I'd still like to keep only one code location where you can actually play through a game fully.

PhilipEkfeldt commented 4 years ago

Sure, it's probably easier to do this in a call. :P I agree it's not a perfect solution either, but I think it's hard to find one in this case. My thoughts are that GameRuleSet would subclass to each game, and then each Game object would be initiated with a GameRuleSet, i.e.

from games import Connect4Rules, Game

new_connect4_game = Game(Connect4Rules)

With this setup I would only need Connect4Rules for the MCTS. The advantage of this is that it separates rules logic from game state. The disadvantage is that it adds more classes and some repetition in terms of method names (but not logic). Both setups could work, it's just determining what works best.

An alternative would be to store both rules and state in the same class, but have both static and instance versions of the required methods (with different names). Don't know if that's better haha