Grzego / async-rl

Variation of "Asynchronous Methods for Deep Reinforcement Learning" with multiple processes generating experience for agent (Keras + Theano + OpenAI Gym)[1-step Q-learning, n-step Q-learning, A3C]
MIT License
44 stars 12 forks source link

Scores Too Low #6

Open impulsecorp opened 6 years ago

impulsecorp commented 6 years ago

I don't get any errors, but when I run play.py for Breakout using your sample weights, it gets scores of only a few points. And if I train the model myself (either from scratch or by resuming training on your saved model) it gets those same low scores. I tried all 3 of your versions: 1-step Q-learning n-step Q-learning A3C

Grzego commented 6 years ago

I tested A3C right now and it seems to work fine. Here are scores from 10 games:

Game #       1; Reward  193;
Game #       2; Reward  319;
Game #       3; Reward  270;
Game #       4; Reward   75;
Game #       5; Reward  229;
Game #       6; Reward  292;
Game #       7; Reward  152;
Game #       8; Reward  295;
Game #       9; Reward  364;
Game #      10; Reward  361;

Can you post your scores and versions of gym, keras and theano?

impulsecorp commented 6 years ago

I don't have the exact results but the rewards for the 10 games were all 0 through 5, using your sample weights. Here's what I am using: gym: 0.9.3 (from git) keras: 2.1.2 theano: 1.0.1 Python 3.5.2

Grzego commented 6 years ago

This is strange. Can you look how input image for network looks like (check transform_screen function)? I suspect there might be a problem with channel layout in convolutions or atari_py screen data.