Open PFCM opened 7 years ago
Hi @PFCM,
have you seen this,
https://github.com/miyosuda/episodic_control
Perhaps you could reuse some of that implementation?
@AjayTalati That is what I did, I used their dictionary code to make a very quick implementation of NEC here: https://github.com/EndingCredits/nn_q_learning_tensorflow
Hi @EndingCredits,
ha! Cool :+1: You're implementation is what I'm trying to do :+1:
Have you tried it out yet?
PS - I'm working in PyTorch, mainly adapting this A3C implementation, which I'm using as a baseline. Hav'nt got it working yet though?
UPDATE - I just tried out your code,
10:45:53, 0/500000it | avg_r: 0.000, avg_q: 0.000, avr_ep_r: 0.0, max_ep_r: 0.0, num_eps: 0, epsilon: 0.100, ewc: 0.0
10:45:57, 2500/500000it | avg_r: 1.000, avg_q: 1.464, avr_ep_r: 67.3, max_ep_r: 191.0, num_eps: 37, epsilon: 0.100, ewc: 0.0
10:46:03, 5000/500000it | avg_r: 1.000, avg_q: 2.906, avr_ep_r: 27.6, max_ep_r: 109.0, num_eps: 91, epsilon: 0.100, ewc: 0.0
10:46:09, 7500/500000it | avg_r: 1.000, avg_q: 4.530, avr_ep_r: 41.9, max_ep_r: 123.0, num_eps: 59, epsilon: 0.100, ewc: 0.0
10:46:16, 10000/500000it | avg_r: 1.000, avg_q: 5.529, avr_ep_r: 50.5, max_ep_r: 175.0, num_eps: 50, epsilon: 0.100, ewc: 0.0
10:46:22, 12500/500000it | avg_r: 1.000, avg_q: 6.403, avr_ep_r: 57.0, max_ep_r: 118.0, num_eps: 44, epsilon: 0.100, ewc: 0.0
10:46:29, 15000/500000it | avg_r: 1.000, avg_q: 6.588, avr_ep_r: 55.5, max_ep_r: 94.0, num_eps: 44, epsilon: 0.100, ewc: 0.0
10:46:35, 17500/500000it | avg_r: 1.000, avg_q: 5.742, avr_ep_r: 63.2, max_ep_r: 89.0, num_eps: 40, epsilon: 0.100, ewc: 0.0
10:46:42, 20000/500000it | avg_r: 1.000, avg_q: 6.561, avr_ep_r: 74.6, max_ep_r: 92.0, num_eps: 33, epsilon: 0.100, ewc: 0.0
10:46:48, 22500/500000it | avg_r: 1.000, avg_q: 6.869, avr_ep_r: 85.8, max_ep_r: 109.0, num_eps: 29, epsilon: 0.100, ewc: 0.0
10:46:55, 25000/500000it | avg_r: 1.000, avg_q: 6.857, avr_ep_r: 125.2, max_ep_r: 148.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
10:47:01, 27500/500000it | avg_r: 1.000, avg_q: 7.577, avr_ep_r: 79.2, max_ep_r: 173.0, num_eps: 32, epsilon: 0.100, ewc: 0.0
10:47:08, 30000/500000it | avg_r: 1.000, avg_q: 7.934, avr_ep_r: 146.9, max_ep_r: 186.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
10:47:14, 32500/500000it | avg_r: 1.000, avg_q: 8.070, avr_ep_r: 123.5, max_ep_r: 171.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
10:47:21, 35000/500000it | avg_r: 1.000, avg_q: 8.615, avr_ep_r: 107.0, max_ep_r: 160.0, num_eps: 24, epsilon: 0.100, ewc: 0.0
10:47:26, 37500/500000it | avg_r: 1.000, avg_q: 8.743, avr_ep_r: 125.0, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
10:47:32, 40000/500000it | avg_r: 1.000, avg_q: 8.926, avr_ep_r: 120.3, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
10:47:38, 42500/500000it | avg_r: 1.000, avg_q: 8.888, avr_ep_r: 135.2, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc: 0.0
10:47:43, 45000/500000it | avg_r: 1.000, avg_q: 8.778, avr_ep_r: 166.7, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc: 0.0
10:47:49, 47500/500000it | avg_r: 1.000, avg_q: 8.861, avr_ep_r: 187.9, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc: 0.0
10:47:55, 50000/500000it | avg_r: 1.000, avg_q: 9.037, avr_ep_r: 185.3, max_ep_r: 200.0, num_eps: 13, epsilon: 0.100, ewc: 0.0
10:48:00, 52500/500000it | avg_r: 1.000, avg_q: 9.270, avr_ep_r: 176.1, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc: 0.0
10:48:05, 55000/500000it | avg_r: 1.000, avg_q: 9.205, avr_ep_r: 181.9, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc: 0.0
10:48:12, 57500/500000it | avg_r: 1.000, avg_q: 9.591, avr_ep_r: 188.4, max_ep_r: 200.0, num_eps: 13, epsilon: 0.100, ewc: 0.0
10:48:18, 60000/500000it | avg_r: 1.000, avg_q: 9.349, avr_ep_r: 180.3, max_ep_r: 200.0, num_eps: 14, epsilon: 0.100, ewc: 0.0
10:48:24, 62500/500000it | avg_r: 1.000, avg_q: 9.326, avr_ep_r: 163.4, max_ep_r: 200.0, num_eps: 16, epsilon: 0.100, ewc: 0.0
10:48:31, 65000/500000it | avg_r: 1.000, avg_q: 8.998, avr_ep_r: 113.9, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc: 0.0
10:48:36, 67500/500000it | avg_r: 1.000, avg_q: 8.917, avr_ep_r: 168.5, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc: 0.0
10:48:42, 70000/500000it | avg_r: 1.000, avg_q: 8.938, avr_ep_r: 152.7, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
10:48:48, 72500/500000it | avg_r: 1.000, avg_q: 9.240, avr_ep_r: 166.7, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc: 0.0
10:48:53, 75000/500000it | avg_r: 1.000, avg_q: 9.313, avr_ep_r: 159.9, max_ep_r: 200.0, num_eps: 15, epsilon: 0.100, ewc: 0.0
10:48:59, 77500/500000it | avg_r: 1.000, avg_q: 9.032, avr_ep_r: 152.5, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
10:49:06, 80000/500000it | avg_r: 1.000, avg_q: 9.125, avr_ep_r: 138.7, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc: 0.0
10:49:12, 82500/500000it | avg_r: 1.000, avg_q: 9.395, avr_ep_r: 145.9, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
10:49:19, 85000/500000it | avg_r: 1.000, avg_q: 9.537, avr_ep_r: 135.2, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc: 0.0
10:49:25, 87500/500000it | avg_r: 1.000, avg_q: 9.677, avr_ep_r: 191.6, max_ep_r: 200.0, num_eps: 13, epsilon: 0.100, ewc: 0.0
10:49:32, 90000/500000it | avg_r: 1.000, avg_q: 8.723, avr_ep_r: 148.0, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc: 0.0
10:49:38, 92500/500000it | avg_r: 1.000, avg_q: 8.662, avr_ep_r: 129.3, max_ep_r: 144.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
10:49:45, 95000/500000it | avg_r: 1.000, avg_q: 8.742, avr_ep_r: 110.9, max_ep_r: 133.0, num_eps: 22, epsilon: 0.100, ewc: 0.0
10:49:51, 97500/500000it | avg_r: 1.000, avg_q: 8.765, avr_ep_r: 124.5, max_ep_r: 143.0, num_eps: 20, epsilon: 0.100, ewc: 0.0
Good job :+1:
Make an agent that does model free episodic control because that is nice and easy and uses the dictionary.