google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.28k stars 937 forks source link

Enhance Bridge State #1117

Closed ZiggerZZ closed 1 year ago

ZiggerZZ commented 1 year ago

Hello,

I would like to propose a bridge state features enhancement to include more tricks from the past play. With 13 tricks every state.observation_tensor() would represent the full history of the card playing phase.

Here's my proposal: we introduce two params, num_tricks that defaults to 2 (as of the current implementation) and absolute_order (default true), that we can use like that: GAME = pyspiel.load_game("bridge(use_double_dummy_result=false, num_tricks=13,absolute_order=false)").

absolute_order=true would indicate that the tricks in the observation tensor would be stored in the absolute order: 1st trick, 2nd trick, ..., current trick, ..., zeros (for futures tricks). absolute_order=false would indicate that the tricks in the observation tensor would be stored in the relative order: current trick, previous trick, ..., zeros (for futures tricks). I find this representation better for RL algorithms.

Here's a working implementation of num_tricks=13, absolute_order=true (replace lines 340-360 in bridge.cc):

    // Indexing into history for recent tricks.
    int current_trick = num_cards_played_ / kNumPlayers;
    int this_trick_cards_played = num_cards_played_ % kNumPlayers;
    int this_trick_start = history_.size() - this_trick_cards_played;

    // All previous tricks
    for (int j = 0; j < current_trick; ++j) {
      int leader = tricks_[j].Leader();
      for (int i = 0; i < kNumPlayers; ++i) {
        int card = history_[this_trick_start - kNumPlayers * (current_trick - j) + i].action;
        int relative_player = (i + leader + kNumPlayers - player) % kNumPlayers;
        ptr[relative_player * kNumCards + card] = 1;
      }
      ptr += kNumPlayers * kNumCards;
    }

    // Current trick
    if (phase_ != Phase::kGameOver) {
      int leader = tricks_[current_trick].Leader();
      for (int i = 0; i < this_trick_cards_played; ++i) {
        int card = history_[this_trick_start + i].action;
        int relative_player = (i + leader + kNumPlayers - player) % kNumPlayers;
        ptr[relative_player * kNumCards + card] = 1;
      }
    }
    // move point for current and future tricks to have a fixed size tensor
    ptr += kNumPlayers * kNumCards * (kNumTricks - current_trick);

Here's a working implementation of num_tricks=13, absolute_order=false (replace lines 340-360 in bridge.cc):

    // Current trick
    if (phase_ != Phase::kGameOver) {
      int leader = tricks_[current_trick].Leader();
      for (int i = 0; i < this_trick_cards_played; ++i) {
        int card = history_[this_trick_start + i].action;
        int relative_player = (i + leader + kNumPlayers - player) % kNumPlayers;
        ptr[relative_player * kNumCards + card] = 1;
      }
    }

    ptr += kNumPlayers * kNumCards;

    // All previous tricks
    for (int j = current_trick - 1; j >= 0; --j) {
      int leader = tricks_[j].Leader();
      for (int i = 0; i < kNumPlayers; ++i) {
        int card = history_[this_trick_start - kNumPlayers * (current_trick - j) + i].action;
        int relative_player = (i + leader + kNumPlayers - player) % kNumPlayers;
        ptr[relative_player * kNumCards + card] = 1;
      }
      ptr += kNumPlayers * kNumCards;
    }

    // move point for current and future tricks to have a fixed size tensor
    ptr += kNumPlayers * kNumCards * (kNumTricks - current_trick - 1);

If you agree with this proposition, I can generalize the above code for any num_tricks=2..13 and open a PR.

lanctot commented 1 year ago

@elkhrt wdyt?

elkhrt commented 1 year ago

This seems reasonable to me. I wouldn't bother with absolute_order - just implement whichever works better.

ZiggerZZ commented 1 year ago

@elkhrt I would rather implement relative order (absolute_order=false), but it would change the current state representation (and potentially tests). Is it okay for you?

ZiggerZZ commented 1 year ago

I opened a PR #1118 and updated observation tensors in the tests.

lanctot commented 1 year ago

Fixed by #1118 which has now been merged. Closing as resolved, but feel free to re-open if you would like to keep discussing.