CUN-bjy / gym-td3-keras

Keras Implementation of TD3(Twin Delayed DDPG) with PER(Prioritized Experience Replay) option on OpenAI gym framework
GNU General Public License v3.0
10 stars 4 forks source link

td3_implementation analysis #10

Open CUN-bjy opened 3 years ago

CUN-bjy commented 3 years ago

the first TD3 implementation do not work well..

so.. have to analysis each part of the differences from ddpg

CUN-bjy commented 3 years ago

[TEST 1]

[TEST 2]

[TEST 3]

CUN-bjy commented 3 years ago

[TEST 4]

[TEST 5]

Screenshot from 2021-01-28 19-19-59

Screenshot from 2021-01-28 19-19-07

[TEST 6]

:eyes: I think the my implementation of double cliped q update has some problems.

CUN-bjy commented 3 years ago

[TEST 7]

[TEST 8]

CUN-bjy commented 3 years ago

[TEST 9]

CUN-bjy commented 3 years ago

[TEST 10]

before

        a = agent.make_action(obs,t)
        action = np.argmax(a) if is_discrete else a

        # do step on gym at t-step
        new_obs, reward, done, info = env.step(action) 

        # store the results to buffer   
        agent.memorize(obs, a, reward, done, new_obs) 
                # should've memorize action w/ noise!!

after

        a = agent.make_action(obs,t)
        action = np.argmax(a) if is_discrete else a

        # do step on gym at t-step
        new_obs, reward, done, info = env.step(action) 

        # store the results to buffer   
        agent.memorize(obs, action, reward, done, new_obs)

but, consequently, doesn't work..

CUN-bjy commented 3 years ago

[TEST 11]

also, doesn't work.

CUN-bjy commented 3 years ago

[TEST 12]