PFCM / neural-episodic-control

30 stars 7 forks source link

MFEC agent #2

Open PFCM opened 7 years ago

PFCM commented 7 years ago

Make an agent that does model free episodic control because that is nice and easy and uses the dictionary.

AjayTalati commented 7 years ago

Hi @PFCM,

have you seen this,

https://github.com/miyosuda/episodic_control

Perhaps you could reuse some of that implementation?

EndingCredits commented 7 years ago

@AjayTalati That is what I did, I used their dictionary code to make a very quick implementation of NEC here: https://github.com/EndingCredits/nn_q_learning_tensorflow

AjayTalati commented 7 years ago

Hi @EndingCredits,

ha! Cool :+1: You're implementation is what I'm trying to do :+1:

Have you tried it out yet?

PS - I'm working in PyTorch, mainly adapting this A3C implementation, which I'm using as a baseline. Hav'nt got it working yet though?

UPDATE - I just tried out your code,

10:45:53,       0/500000it | avg_r: 0.000, avg_q: 0.000, avr_ep_r:  0.0, max_ep_r:  0.0, num_eps: 0, epsilon: 0.100, ewc:  0.0
10:45:57,    2500/500000it | avg_r: 1.000, avg_q: 1.464, avr_ep_r: 67.3, max_ep_r: 191.0, num_eps: 37, epsilon: 0.100, ewc:  0.0
10:46:03,    5000/500000it | avg_r: 1.000, avg_q: 2.906, avr_ep_r: 27.6, max_ep_r: 109.0, num_eps: 91, epsilon: 0.100, ewc:  0.0
10:46:09,    7500/500000it | avg_r: 1.000, avg_q: 4.530, avr_ep_r: 41.9, max_ep_r: 123.0, num_eps: 59, epsilon: 0.100, ewc:  0.0
10:46:16,   10000/500000it | avg_r: 1.000, avg_q: 5.529, avr_ep_r: 50.5, max_ep_r: 175.0, num_eps: 50, epsilon: 0.100, ewc:  0.0
10:46:22,   12500/500000it | avg_r: 1.000, avg_q: 6.403, avr_ep_r: 57.0, max_ep_r: 118.0, num_eps: 44, epsilon: 0.100, ewc:  0.0
10:46:29,   15000/500000it | avg_r: 1.000, avg_q: 6.588, avr_ep_r: 55.5, max_ep_r: 94.0, num_eps: 44, epsilon: 0.100, ewc:  0.0
10:46:35,   17500/500000it | avg_r: 1.000, avg_q: 5.742, avr_ep_r: 63.2, max_ep_r: 89.0, num_eps: 40, epsilon: 0.100, ewc:  0.0
10:46:42,   20000/500000it | avg_r: 1.000, avg_q: 6.561, avr_ep_r: 74.6, max_ep_r: 92.0, num_eps: 33, epsilon: 0.100, ewc:  0.0
10:46:48,   22500/500000it | avg_r: 1.000, avg_q: 6.869, avr_ep_r: 85.8, max_ep_r: 109.0, num_eps: 29, epsilon: 0.100, ewc:  0.0
10:46:55,   25000/500000it | avg_r: 1.000, avg_q: 6.857, avr_ep_r: 125.2, max_ep_r: 148.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
10:47:01,   27500/500000it | avg_r: 1.000, avg_q: 7.577, avr_ep_r: 79.2, max_ep_r: 173.0, num_eps: 32, epsilon: 0.100, ewc:  0.0
10:47:08,   30000/500000it | avg_r: 1.000, avg_q: 7.934, avr_ep_r: 146.9, max_ep_r: 186.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
10:47:14,   32500/500000it | avg_r: 1.000, avg_q: 8.070, avr_ep_r: 123.5, max_ep_r: 171.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
10:47:21,   35000/500000it | avg_r: 1.000, avg_q: 8.615, avr_ep_r: 107.0, max_ep_r: 160.0, num_eps: 24, epsilon: 0.100, ewc:  0.0
10:47:26,   37500/500000it | avg_r: 1.000, avg_q: 8.743, avr_ep_r: 125.0, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
10:47:32,   40000/500000it | avg_r: 1.000, avg_q: 8.926, avr_ep_r: 120.3, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
10:47:38,   42500/500000it | avg_r: 1.000, avg_q: 8.888, avr_ep_r: 135.2, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc:  0.0
10:47:43,   45000/500000it | avg_r: 1.000, avg_q: 8.778, avr_ep_r: 166.7, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc:  0.0
10:47:49,   47500/500000it | avg_r: 1.000, avg_q: 8.861, avr_ep_r: 187.9, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc:  0.0
10:47:55,   50000/500000it | avg_r: 1.000, avg_q: 9.037, avr_ep_r: 185.3, max_ep_r: 200.0, num_eps: 13, epsilon: 0.100, ewc:  0.0
10:48:00,   52500/500000it | avg_r: 1.000, avg_q: 9.270, avr_ep_r: 176.1, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc:  0.0
10:48:05,   55000/500000it | avg_r: 1.000, avg_q: 9.205, avr_ep_r: 181.9, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc:  0.0
10:48:12,   57500/500000it | avg_r: 1.000, avg_q: 9.591, avr_ep_r: 188.4, max_ep_r: 200.0, num_eps: 13, epsilon: 0.100, ewc:  0.0
10:48:18,   60000/500000it | avg_r: 1.000, avg_q: 9.349, avr_ep_r: 180.3, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc:  0.0
10:48:24,   62500/500000it | avg_r: 1.000, avg_q: 9.326, avr_ep_r: 163.4, max_ep_r: 200.0, num_eps: 16, epsilon: 0.100, ewc:  0.0
10:48:31,   65000/500000it | avg_r: 1.000, avg_q: 8.998, avr_ep_r: 113.9, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
10:48:36,   67500/500000it | avg_r: 1.000, avg_q: 8.917, avr_ep_r: 168.5, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc:  0.0
10:48:42,   70000/500000it | avg_r: 1.000, avg_q: 8.938, avr_ep_r: 152.7, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
10:48:48,   72500/500000it | avg_r: 1.000, avg_q: 9.240, avr_ep_r: 166.7, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc:  0.0
10:48:53,   75000/500000it | avg_r: 1.000, avg_q: 9.313, avr_ep_r: 159.9, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc:  0.0
10:48:59,   77500/500000it | avg_r: 1.000, avg_q: 9.032, avr_ep_r: 152.5, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
10:49:06,   80000/500000it | avg_r: 1.000, avg_q: 9.125, avr_ep_r: 138.7, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc:  0.0
10:49:12,   82500/500000it | avg_r: 1.000, avg_q: 9.395, avr_ep_r: 145.9, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
10:49:19,   85000/500000it | avg_r: 1.000, avg_q: 9.537, avr_ep_r: 135.2, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc:  0.0
10:49:25,   87500/500000it | avg_r: 1.000, avg_q: 9.677, avr_ep_r: 191.6, max_ep_r: 200.0, num_eps: 13, epsilon: 0.100, ewc:  0.0
10:49:32,   90000/500000it | avg_r: 1.000, avg_q: 8.723, avr_ep_r: 148.0, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
10:49:38,   92500/500000it | avg_r: 1.000, avg_q: 8.662, avr_ep_r: 129.3, max_ep_r: 144.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
10:49:45,   95000/500000it | avg_r: 1.000, avg_q: 8.742, avr_ep_r: 110.9, max_ep_r: 133.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
10:49:51,   97500/500000it | avg_r: 1.000, avg_q: 8.765, avr_ep_r: 124.5, max_ep_r: 143.0, num_eps: 20, epsilon: 0.100, ewc:  0.0

Good job :+1: