Although in this example below, I am sending action 0 -- it's true for any action - tested on Breakout and Zaxxon
import sys
from ale_python_interface import ALEInterface
import numpy as np
ale = ALEInterface()
ale.loadROM("Breakout.bin")
legal_actions = ale.getLegalActionSet()
for episode in range(10):
total_reward = 0.0
while not ale.game_over():
print "acting"
reward = ale.act(0);
total_reward += reward
print("Episode " + str(episode) + " ended with score: " + str(total_reward))
ale.reset_game()
In order to avoid the hanging -- I resorted to performing a random action every 20 (a rough maximum value discovered via trial and error) frames. This bug came up when my Q-learner degenerated to choosing a single action.
Although in this example below, I am sending action 0 -- it's true for any action - tested on Breakout and Zaxxon
In order to avoid the hanging -- I resorted to performing a random action every 20 (a rough maximum value discovered via trial and error) frames. This bug came up when my Q-learner degenerated to choosing a single action.