Open AjayTalati opened 7 years ago
Hi Aj, Unfortunately this code is only set up for environments where the input is a 1-dimensional vector. It isn't too hard to adapt it to image observations, although getting it so inputs are the last 4 frames is a bit of a pain. (Yes, I use an earlier version of TF, but that isn't the issue here, although it may cause problems elsewhere.)
Do the implementations run for you if you run Cartpole-v0? It would be nice to know that this works on other machines.
Also, main.py runs DQN (with some extras), a2c.py runs an actor-advantage-critic algorithm (using replay mem rather than distributed rep) and NEC.py runs the NEC agent.
-Will
EDIT: Just remembered that a2c.py should be set up to work with Atari envs. Have a look at that if you want to look at adapting the others.
Hi Will,
thanks a lot for the help :+1: - I'm just working through your code and the paper now.
Cartpole seems to work well - I haven’t checked against my A3C implementations, but from memory I think it looks better,
11:28:21, 450000/500000it | avg_r: 1.000, avg_q: 9.169, avr_ep_r: 113.7, max_ep_r: 127.0, num_eps: 22, epsilon: 0.100, ewc: 0.0
11:28:28, 452500/500000it | avg_r: 1.000, avg_q: 9.258, avr_ep_r: 104.6, max_ep_r: 137.0, num_eps: 24, epsilon: 0.100, ewc: 0.0
11:28:34, 455000/500000it | avg_r: 1.000, avg_q: 9.130, avr_ep_r: 118.7, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc: 0.0
11:28:40, 457500/500000it | avg_r: 1.000, avg_q: 9.561, avr_ep_r: 110.0, max_ep_r: 200.0, num_eps: 23, epsilon: 0.100, ewc: 0.0
11:28:46, 460000/500000it | avg_r: 1.000, avg_q: 9.550, avr_ep_r: 89.7, max_ep_r: 200.0, num_eps: 28, epsilon: 0.100, ewc: 0.0
11:28:53, 462500/500000it | avg_r: 1.000, avg_q: 9.625, avr_ep_r: 133.0, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc: 0.0
11:28:59, 465000/500000it | avg_r: 1.000, avg_q: 9.554, avr_ep_r: 113.6, max_ep_r: 149.0, num_eps: 22, epsilon: 0.100, ewc: 0.0
11:29:05, 467500/500000it | avg_r: 1.000, avg_q: 9.576, avr_ep_r: 130.9, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc: 0.0
11:29:12, 470000/500000it | avg_r: 1.000, avg_q: 9.381, avr_ep_r: 126.8, max_ep_r: 169.0, num_eps: 19, epsilon: 0.100, ewc: 0.0
11:29:18, 472500/500000it | avg_r: 1.000, avg_q: 9.605, avr_ep_r: 137.9, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc: 0.0
11:29:24, 475000/500000it | avg_r: 1.000, avg_q: 9.462, avr_ep_r: 136.7, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc: 0.0
11:29:31, 477500/500000it | avg_r: 1.000, avg_q: 9.304, avr_ep_r: 118.1, max_ep_r: 142.0, num_eps: 21, epsilon: 0.100, ewc: 0.0
11:29:37, 480000/500000it | avg_r: 1.000, avg_q: 9.319, avr_ep_r: 98.4, max_ep_r: 121.0, num_eps: 26, epsilon: 0.100, ewc: 0.0
11:29:43, 482500/500000it | avg_r: 1.000, avg_q: 9.044, avr_ep_r: 119.0, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc: 0.0
11:29:50, 485000/500000it | avg_r: 1.000, avg_q: 9.231, avr_ep_r: 109.2, max_ep_r: 158.0, num_eps: 22, epsilon: 0.100, ewc: 0.0
11:29:56, 487500/500000it | avg_r: 1.000, avg_q: 9.153, avr_ep_r: 113.5, max_ep_r: 188.0, num_eps: 22, epsilon: 0.100, ewc: 0.0
11:30:02, 490000/500000it | avg_r: 1.000, avg_q: 9.372, avr_ep_r: 124.8, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
11:30:08, 492500/500000it | avg_r: 1.000, avg_q: 9.031, avr_ep_r: 144.8, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
11:30:15, 495000/500000it | avg_r: 1.000, avg_q: 9.136, avr_ep_r: 98.4, max_ep_r: 200.0, num_eps: 26, epsilon: 0.100, ewc: 0.0
11:30:21, 497500/500000it | avg_r: 1.000, avg_q: 9.100, avr_ep_r: 149.2, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
100%|████████████████████████| 500000/500000 [20:46<00:00, 401.28it/s]
I'll try to get it working for the Atari envs too :) If you're interested there's a fairly clean implement in PyTorch
Looks like a fun project :+1:
All the best - Aj
PS - I've only read the paper quickly, but it seems there's no need for the actor-critic stuff in a2c?
Hi Will,
I was wondering whether you got this working for 2D pixel inputs, i.e. Atari.
If so did you manage to get anywhere close to DMs published results, (I guess they do a lot of model searching/hyper-parameter tuning) ?
All the best, Aj
Hi Aj, I did get it working for an Atari setting (ALE), but I haven't managed to get any good results yet.
Code is a bit of a mess, so will probably tidy it up before sharing. -Will
Update: You can find the repo here https://github.com/EndingCredits/Neural-Episodic-Control . Only extra thing you'll need to install is ALE I think.
Great, thanks very much for your work on it :)
I guess if it does'nt perform SOTA on Atari, (or you can't tune it as well as DM), you'll find some environments where it is strong in - you know the Wolpert and Macready NFL thm,
We have dubbed the associated results NFL theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.[1]
Hi @EndingCredits,
this is really cool that you got the
NEC
working :+1:Have you tried to run your code on the Atari environments, in Open AI gym?
I tried to train on
Pong
, but I got this error,I guess it might be related to
TF v1.0
, does this repo use an earlier version?Thank a lot for your help,
Aj