Open CUN-bjy opened 3 years ago
[TEST 1]
[TEST 2]
[TEST 3]
[TEST 4]
[TEST 5]
[TEST 6]
double cliped q update
has some problems.[TEST 7]
[TEST 8]
[TEST 9]
[TEST 10]
before
a = agent.make_action(obs,t)
action = np.argmax(a) if is_discrete else a
# do step on gym at t-step
new_obs, reward, done, info = env.step(action)
# store the results to buffer
agent.memorize(obs, a, reward, done, new_obs)
# should've memorize action w/ noise!!
after
a = agent.make_action(obs,t)
action = np.argmax(a) if is_discrete else a
# do step on gym at t-step
new_obs, reward, done, info = env.step(action)
# store the results to buffer
agent.memorize(obs, action, reward, done, new_obs)
but, consequently, doesn't work..
[TEST 11]
also, doesn't work.
[TEST 12]
the first TD3 implementation do not work well..
so.. have to analysis each part of the differences from ddpg