alshedivat / lola

Code release for Learning with Opponent-Learning Awareness and variations.
MIT License
145 stars 35 forks source link

Coin Game #7

Open LUKELIEM opened 6 years ago

LUKELIEM commented 6 years ago

Can you suggest a sample command line to run Coin Game?

I tried running just:

python scripts/run_lola.py --exp_name=CoinGame --no-exact

and it seems to be updating parameters and using up all the CPUs and not showing any indication what the progress is.

Logging to logs/CoinGame/seed-0 values (600000, 240) main0/input_proc/Conv/weights:0 (3, 3, 3, 20) main0/input_proc/Conv/BatchNorm/beta:0 (20,) main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main0/input_proc/Conv_1/BatchNorm/beta:0 (20,) main0/input_proc/fully_connected/weights:0 (240, 1) main0/input_proc/fully_connected/biases:0 (1,) main0/rnn/wx:0 (240, 128) main0/rnn/wh:0 (32, 128) main0/rnn/b:0 (128,) main0/fully_connected/weights:0 (32, 4) main0/fully_connected/biases:0 (4,) values (4000, 240) main0/input_proc/Conv/weights:0 (3, 3, 3, 20) main0/input_proc/Conv/BatchNorm/beta:0 (20,) main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main0/input_proc/Conv_1/BatchNorm/beta:0 (20,) main0/input_proc/fully_connected/weights:0 (240, 1) main0/input_proc/fully_connected/biases:0 (1,) main0/rnn/wx:0 (240, 128) main0/rnn/wh:0 (32, 128) main0/rnn/b:0 (128,) main0/fully_connected/weights:0 (32, 4) main0/fully_connected/biases:0 (4,) values (600000, 240) main1/input_proc/Conv/weights:0 (3, 3, 3, 20) main1/input_proc/Conv/BatchNorm/beta:0 (20,) main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main1/input_proc/Conv_1/BatchNorm/beta:0 (20,) main1/input_proc/fully_connected/weights:0 (240, 1) main1/input_proc/fully_connected/biases:0 (1,) main1/rnn/wx:0 (240, 128) main1/rnn/wh:0 (32, 128) main1/rnn/b:0 (128,) main1/fully_connected/weights:0 (32, 4) main1/fully_connected/biases:0 (4,) values (4000, 240) main1/input_proc/Conv/weights:0 (3, 3, 3, 20) main1/input_proc/Conv/BatchNorm/beta:0 (20,) main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main1/input_proc/Conv_1/BatchNorm/beta:0 (20,) main1/input_proc/fully_connected/weights:0 (240, 1) main1/input_proc/fully_connected/biases:0 (1,) main1/rnn/wx:0 (240, 128) main1/rnn/wh:0 (32, 128) main1/rnn/b:0 (128,) main1/fully_connected/weights:0 (32, 4) main1/fully_connected/biases:0 (4,) 2018-11-04 16:36:10.603357: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA update params update params update params update params ^C Aborted!

alshedivat commented 6 years ago

That's expected behavior. CoinGame takes a while to run. The code logs stats every 20 updates.

LUKELIEM commented 6 years ago

Does the code make use of GPU? Since the policy network is an RNN, I suppose it will not help much. How long will it run typically?

Intel® Core™ i7-7700K CPU @ 4.20GHz × 8

LUKELIEM commented 6 years ago

There also seems to be some discrepancy about the reward structure of the Coin Game in your code versus that described in the paper:

Am I reading the code correctly, or am I missing something?

    # Compute rewards
    reward_red, reward_blue = [], []
    for i in range(self.batch_size):
        generate = False
        if self.red_coin[i]:
            # If the coin is red,
            if self._same_pos(self.red_pos[i], self.coin_pos[i]):
                # If red agent grabs the coin (regardless what blue agent does):
                #    red gets +1, blue gets 0
                generate = True
                reward_red.append(1)
                reward_blue.append(0)
            elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
                # If blue agent grabs the coin, but red agent does not:
                #    blue gets +1, red gets -2
                generate = True
                reward_red.append(-2)
                reward_blue.append(1)
            else:
                # In all other cases
                #    both blue and red get 0
                reward_red.append(0)
                reward_blue.append(0)

        else:
            # If the coin is blue,
            if self._same_pos(self.red_pos[i], self.coin_pos[i]):
                # If red agent grabs the coin (regardless what blue agent does):
                #    red gets +1, blue gets -2
                generate = True
                reward_red.append(1)
                reward_blue.append(-2)
            elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
                # If blue agents grabs the coin, but red agent does not:
                #    blue gets +1, red gets 0                    
                generate = True
                reward_red.append(0)
                reward_blue.append(1)
            else:
                # In all other cases
                #    both blue and red get 0
                reward_red.append(0)
                reward_blue.append(0)

        if generate:
            # Regenerate a coin if an agent has grabbed the coin
            self._generate_coin(i)
alshedivat commented 6 years ago

@LUKELIEM, our original experiments took a few days (someone independently reproduced our results using this codebase during the summer).

Re: your comment about potential bias in rewards, I believe #5 must have fixed it.

LUKELIEM commented 6 years ago

Your code coin_game.py on GitHub is still the same code with the same issue.

On Sun, Nov 4, 2018 at 6:27 PM Maruan notifications@github.com wrote:

@LUKELIEM https://github.com/LUKELIEM, our original experiments took a few days (someone independently reproduced our results using this code base during the summer).

Re: your comment about potential bias in rewards, I believe #5 https://github.com/alshedivat/lola/pull/5 must have fixed it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alshedivat/lola/issues/7#issuecomment-435736773, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ-SbfCwpCbHeOYM0kklILrr9K8k4rXUks5ur6IbgaJpZM4YNm0u .

alshedivat commented 6 years ago

I see. I believe the fixed version of coin game is in lola_dice/envs/coin_game.py.

We should've reconciled environments in lola_dice/envs and lola/envs from the beginning, but never got around to it. A contribution would be very much welcome!

LUKELIEM commented 6 years ago

Thanks, you are right. It has been fixed in lola_dice/envs