Open LUKELIEM opened 6 years ago
That's expected behavior. CoinGame takes a while to run. The code logs stats every 20 updates.
Does the code make use of GPU? Since the policy network is an RNN, I suppose it will not help much. How long will it run typically?
Intel® Core™ i7-7700K CPU @ 4.20GHz × 8
There also seems to be some discrepancy about the reward structure of the Coin Game in your code versus that described in the paper:
Am I reading the code correctly, or am I missing something?
# Compute rewards
reward_red, reward_blue = [], []
for i in range(self.batch_size):
generate = False
if self.red_coin[i]:
# If the coin is red,
if self._same_pos(self.red_pos[i], self.coin_pos[i]):
# If red agent grabs the coin (regardless what blue agent does):
# red gets +1, blue gets 0
generate = True
reward_red.append(1)
reward_blue.append(0)
elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
# If blue agent grabs the coin, but red agent does not:
# blue gets +1, red gets -2
generate = True
reward_red.append(-2)
reward_blue.append(1)
else:
# In all other cases
# both blue and red get 0
reward_red.append(0)
reward_blue.append(0)
else:
# If the coin is blue,
if self._same_pos(self.red_pos[i], self.coin_pos[i]):
# If red agent grabs the coin (regardless what blue agent does):
# red gets +1, blue gets -2
generate = True
reward_red.append(1)
reward_blue.append(-2)
elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
# If blue agents grabs the coin, but red agent does not:
# blue gets +1, red gets 0
generate = True
reward_red.append(0)
reward_blue.append(1)
else:
# In all other cases
# both blue and red get 0
reward_red.append(0)
reward_blue.append(0)
if generate:
# Regenerate a coin if an agent has grabbed the coin
self._generate_coin(i)
@LUKELIEM, our original experiments took a few days (someone independently reproduced our results using this codebase during the summer).
Re: your comment about potential bias in rewards, I believe #5 must have fixed it.
Your code coin_game.py on GitHub is still the same code with the same issue.
On Sun, Nov 4, 2018 at 6:27 PM Maruan notifications@github.com wrote:
@LUKELIEM https://github.com/LUKELIEM, our original experiments took a few days (someone independently reproduced our results using this code base during the summer).
Re: your comment about potential bias in rewards, I believe #5 https://github.com/alshedivat/lola/pull/5 must have fixed it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alshedivat/lola/issues/7#issuecomment-435736773, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ-SbfCwpCbHeOYM0kklILrr9K8k4rXUks5ur6IbgaJpZM4YNm0u .
I see. I believe the fixed version of coin game is in lola_dice/envs/coin_game.py
.
We should've reconciled environments in lola_dice/envs
and lola/envs
from the beginning, but never got around to it. A contribution would be very much welcome!
Thanks, you are right. It has been fixed in lola_dice/envs
Can you suggest a sample command line to run Coin Game?
I tried running just:
python scripts/run_lola.py --exp_name=CoinGame --no-exact
and it seems to be updating parameters and using up all the CPUs and not showing any indication what the progress is.
Logging to logs/CoinGame/seed-0 values (600000, 240) main0/input_proc/Conv/weights:0 (3, 3, 3, 20) main0/input_proc/Conv/BatchNorm/beta:0 (20,) main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main0/input_proc/Conv_1/BatchNorm/beta:0 (20,) main0/input_proc/fully_connected/weights:0 (240, 1) main0/input_proc/fully_connected/biases:0 (1,) main0/rnn/wx:0 (240, 128) main0/rnn/wh:0 (32, 128) main0/rnn/b:0 (128,) main0/fully_connected/weights:0 (32, 4) main0/fully_connected/biases:0 (4,) values (4000, 240) main0/input_proc/Conv/weights:0 (3, 3, 3, 20) main0/input_proc/Conv/BatchNorm/beta:0 (20,) main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main0/input_proc/Conv_1/BatchNorm/beta:0 (20,) main0/input_proc/fully_connected/weights:0 (240, 1) main0/input_proc/fully_connected/biases:0 (1,) main0/rnn/wx:0 (240, 128) main0/rnn/wh:0 (32, 128) main0/rnn/b:0 (128,) main0/fully_connected/weights:0 (32, 4) main0/fully_connected/biases:0 (4,) values (600000, 240) main1/input_proc/Conv/weights:0 (3, 3, 3, 20) main1/input_proc/Conv/BatchNorm/beta:0 (20,) main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main1/input_proc/Conv_1/BatchNorm/beta:0 (20,) main1/input_proc/fully_connected/weights:0 (240, 1) main1/input_proc/fully_connected/biases:0 (1,) main1/rnn/wx:0 (240, 128) main1/rnn/wh:0 (32, 128) main1/rnn/b:0 (128,) main1/fully_connected/weights:0 (32, 4) main1/fully_connected/biases:0 (4,) values (4000, 240) main1/input_proc/Conv/weights:0 (3, 3, 3, 20) main1/input_proc/Conv/BatchNorm/beta:0 (20,) main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20) main1/input_proc/Conv_1/BatchNorm/beta:0 (20,) main1/input_proc/fully_connected/weights:0 (240, 1) main1/input_proc/fully_connected/biases:0 (1,) main1/rnn/wx:0 (240, 128) main1/rnn/wh:0 (32, 128) main1/rnn/b:0 (128,) main1/fully_connected/weights:0 (32, 4) main1/fully_connected/biases:0 (4,) 2018-11-04 16:36:10.603357: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA update params update params update params update params ^C Aborted!