Made it so instead of testing with 100 trials when mean_score > best_score, still update the best score then, but only test with 100 trials this way:
if (best_score is None) or (mean_score > best_score):
best_score = mean_score
print(f'New best score {best_score:.3f} in generation {gen}')
best_weights = self.agent.get_weight_matrix()
if mean_score > 0.8*self.solved_avg_reward:
# If it achieved a new best score, test for self.N_eval_trials episode average score.
# If self.N_eval_trials ep mean score is >= self.solved_avg_reward, it's considered solved.
eval_trials = []
for _ in range(self.N_eval_trials):
eval_trials.append(self.run_episode())
eval_mean = np.mean(eval_trials)
So now it's not testing on scores that are the current best, but low. It speeds up execution maybe 10x or more.
Also added a decorator timer to the benchmark function, and fixed a bug with np.random.choice.
Made it so instead of testing with 100 trials when
mean_score > best_score
, still update the best score then, but only test with 100 trials this way:So now it's not testing on scores that are the current best, but low. It speeds up execution maybe 10x or more.
Also added a decorator timer to the benchmark function, and fixed a bug with
np.random.choice
.