google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.25k stars 933 forks source link

[RBC] Various issues encountered during the use of RBC #811

Open acxz opened 2 years ago

acxz commented 2 years ago

Hello, first off thanks for creating this amazing group of environments for board games! It is only place where I have found an implementation of ReconBuildChess besides APL's own implementation.

I've been trying to utilize the RBC environment and here are some issues/pain points that I have encountered and thought I'd share:

Note: I am using the RBC environment through a gym like wrapper created by the ray-project team here

So far my main goal has been to recreate the random, attacker, and trout bots that APL provides located here. Here is my progress. Which I have been able to recreate with some slight issues and workarounds.

Ultimately, I'd like to have the same API for creating bots as APL does so that a class that defines an APL bot can be used one to one for a ray/rllib policy with their open_spiel wrapper. Right now I have a rough skeleton on where the APL bot functions go in the ray/rllib pipeline, but don't separate them into their own functions.

Basically, be able to train an RBC agent using a rllib with the open_spiel environment and then deploy it on the APL rbc servers.

Anyway here are some issues I have had concerning open_spiel's rbc implementation.

1) Not all sense actions are encoded.

https://github.com/deepmind/open_spiel/blob/b6f88b6f2fddadd7e0f8e01c78d2becb61c51a33/open_spiel/games/rbc.cc#L421-L423

From what I understand of the above code, since a sense grid is 3x3, it make less sense for an agent to sense at the edges of the board, i.e. ranks 1 and 8 and files a and h. First off, this is different from APL's implementation as they do allow you to sense any location. Secondly, this is currently not implemented properly as the cached_legal_actions_ is simply resized. this gives the first (inner_size x inner_size 6x6) 36 actions, instead of the 36 actions corresponding to the inner board. Effectively, allowing the agent to sense file a, but not files g or h.

I believe the proper way to fix this is to allow the agent to sense anywhere on the game board, even on the edges, for APL consistency's sake, but honestly just fixing the logic for the cached_legal_actions_ should be good enough.

Related commit: https://github.com/deepmind/open_spiel/commit/d1f6735d3c0ddc1ac70ebe18c33bfff3a06e7d88

2) Return the previous taken_action in ObservationString/ObservationTensor

RBC is an uncertain game and the requested action may not be the action that is taken due to the board state. This information is required for agents to properly keep track of the game state and is information that APL's RBC provides. See 6i) for more info on how I incorrectly workaround this issue currently using state.history. But essentially, the previous taken_action should be provided to the agents via both the ObservationString and the ObservationTensor. See APL's hande_move_result for more info.

Maybe this information exists, but I have not been able to extract it from ObservationString.

Current rationale:

https://github.com/deepmind/open_spiel/blob/0c265fbaa56c418788333fdf3bd373ff12620d14/open_spiel/games/rbc.h#L93-L94

I believe this should be changed to provide the information explicitly in the ObservationString/ObservationTensor. This could also help speed up learning, as this important doesn't have to be extracted.

3) Return your captured piece and location in ObservationString/ObservationTensor

Previously related issue: https://github.com/deepmind/open_spiel/issues/696

APL's RBC will return the your captured piece and location for you to handle any internal logic based on the opponents move. This allows you to update say your internal board representation based on pieces you have lost. From what I can tell, in open_spiel, you are told that your piece is captured, but not which one or where.

https://github.com/deepmind/open_spiel/blob/b6f88b6f2fddadd7e0f8e01c78d2becb61c51a33/open_spiel/games/rbc.cc#L220-L221

This looks to be missing functionality. See handle_opponent_move_result for more info.

Current rationale: https://github.com/deepmind/open_spiel/blob/0c265fbaa56c418788333fdf3bd373ff12620d14/open_spiel/games/rbc.h#L75-L76 I believe this should be changed to provide the information explicitly in the ObservationString/ObservationTensor.

This could also help speed up learning, as this important doesn't have to be extracted.

4) Return the previous sense action in ObservationString/ObservationTensor

This is similar to (2): return the previous taken_action. The ObservationString contains the current observed board and does contain the newly sensed information when calling state.observation_string() during the move phase. However, the open_spiel env should return which grids were requested to sense so that the agent can parse the ObservationString to determine the latest sense results. See: handle_sense_result for what APL provides the agents.

5) Pawn promotion to queen by default

Certain moves that are illegal will still have an effect on the board and this case is not captured in open_spiel's implementation. Particularly, say my pawn is at d7 and the move my algorithm choose is d7e8 to capture an opponent piece. Based on the legal moves (for both open_spiel and APL), d7e8 does not exist, but d7e8q does. (with the queen promotion). APL's RBC will convert my d7e8 to d7e8q but open_spiel's rbc will throw an error saying d7e8 does not exist.


Note: The qualms below are only valid since (2)-(4) are not currently properly implemented. This has made try to use state.action_to_string() to attempt to obtain the missing information in uci form, which made me realize the limitations that the current action_to_string() method has. (6) may not/is not worth the time if (2)-(4) are properly handled and the information is provided in ObservationString.

6) Cannot call action_to_string() and string_to_action() with the different action spaces (sense action space, move action space).

https://github.com/deepmind/open_spiel/blob/b6f88b6f2fddadd7e0f8e01c78d2becb61c51a33/open_spiel/games/rbc.cc#L444-L454

This one might be a bit harder to explain, but essentially when creating an agent that uses the uci action encoding (for example an agent based on stockfish) then moving between the open_spiel action representation and the uci representation is important. Moreover, it is important to be able to use the action_to_string method on the input space of move actions while I am currently in sense mode and vice versa. Here are the following cases where it is particularly needed.

i) During an agent's sense phase, the agent would like to hande_move_result of its last move that it took. This means that (See (2)) the open_spiel env should provide the agent the taken_move (note which is different from the requested move as rbc is an uncertain game). This taken_move should be provided in the ObservationString, but since it is not I actually use the state.history() to obtain the last requested_move by the player (which is in open_spiel move action space). For me to convert this into something that my agent can utilize (say to keep track of its internal board representation) then I need to use action_to_string(taken_action). This will fail since action_to_string is being called during the sense phase and not the move phase. To handle this, I actually keep a deep copy of a state which is in sense phase, and a copy of a state which is in move phase, so that I can call move_state.action_to_string(taken_action) during the sense phase to get the proper string of the corresponding previous move action.

See the following for how I handle this in detail:

                # handle_move_result
                # if a move was executed, apply it to our board
                taken_action_history = state.history()
                color_shift = not self.color
                player_taken_action_history = \
                    [player_taken_action
                     for move_count in range(len(taken_action_history)//4)
                     for player_taken_action in
                     (taken_action_history[2*(2*move_count+color_shift)],
                      taken_action_history[2*(2*move_count+color_shift)+1])]
                taken_action = player_taken_action_history[-1]
                # transform action into string
                taken_action_string = self.prev_state.action_to_string(
                    taken_action)
                # if taken action was pass then don't push to board
                if taken_action_string != 'pass':
                    taken_move = chess.Move.from_uci(taken_action_string)
                    self.board.push(taken_move)

Note: I am assuming that the previous requested action was the taken action which is not the case (See: (2))

There are two other places (See: (3)/(4)) where I utilized this action_to_string, but they are sort of needed because of the lack of functionality elsewhere. Similarly, in case i) I would not be resorted to using state.action_to_string if taken_move was provided to me in the ObservationString.


cc'ing as original implementer and avid user @michalsustr and @VitamintK

if you got here then have a cookie: :cookie:

lanctot commented 2 years ago

Thanks @acxz. @michalsustr can you take a look and respond to Akash?

michalsustr commented 2 years ago

Hi @acxz, I will try to have a look on the weekend. If you have an implementation that would fix these I could review the PR :)

acxz commented 2 years ago

Sweet! Sadly I'm a bit busy for ~2 weeks, but after that can put in the effort to help resolve this issue.

for me 1) & 5) should be p quick.

I think 2) and 4) (recording the previous taken action) is the most important and would appreciate it if you could tackle those.

michalsustr commented 2 years ago

Thanks! Sorry, this is not high priority for me at the moment due to neurips deadline coming, but I will try to have a look then :)

lanctot commented 1 year ago

For book-keeping: a PR was opened for issue 1) here: https://github.com/deepmind/open_spiel/pull/817 but was never finished.

lanctot commented 10 months ago

@acxz @michalsustr just circling back to this now.

Wondering what we should do here. Are these issues serious enough to warrant removal? I don't like the idea of keeping buggy games in the code base. Will these ever be fixed?

lanctot commented 10 months ago

Actually, I think I have a better idea for games in this category: https://github.com/google-deepmind/open_spiel/pull/1163

acxz commented 10 months ago

Hello @lanctot!

I definitely want to get back to this, but I can't say when I'll get back to RBC related work. I know I had some work done on this and I think it would be unfortunate for an RBC implementation to be removed. I really think this should be kept and worked upon.

Adding the warning I think is a wonderful idea, that still lets other use and work on top of it!