Huixxi / TensorFlow2.0-for-Deep-Reinforcement-Learning

TensorFlow 2.0 for Deep Reinforcement Learning. :octopus:
82 stars 24 forks source link

Agent doesn't learn anything #2

Closed nancyhwr closed 4 years ago

nancyhwr commented 4 years ago

Thank you so much for sharing this code, truly helpful. However, the agent couldn't learn anything when I trained with "Breakout-ram-v0" and "Pong-ram-vo". I tried different setting, such as:

buffer_size=100000, learning_rate=.0015, epsilon=.99, epsilon_dacay=0.9999,
min_epsilon=.1, gamma=.95, batch_size=64, target_update_iter=400, 
train_nums=10000, start_learning=200

agent network is:

self.input_layer = tf.keras.layers.InputLayer(input_shape=(num_states,))
self.fc1 = tf.keras.layers.Dense(hidden_units, activation = 'relu', kernel_initializer = 'he_uniform')
self.fc2 = tf.keras.layers.Dense(hidden_units, activation = 'relu', kernel_initializer = 'he_uniform')
self.output_layer = tf.keras.layers.Dense(num_actions,name = 'q_values')

loss function is "mse", optimizer is Adam. Could anyone help? Really appreciate it!

nancyhwr commented 4 years ago

Doesn't work even with CartPole after 5000 training, the performance is changed from 9/2000 to 10/2000

Huixxi commented 4 years ago

Thank you so much for sharing this code, truly helpful. However, the agent couldn't learn anything when I trained with "Breakout-ram-v0" and "Pong-ram-vo". I tried different setting, such as:

buffer_size=100000, learning_rate=.0015, epsilon=.99, epsilon_dacay=0.9999,
min_epsilon=.1, gamma=.95, batch_size=64, target_update_iter=400, 
train_nums=10000, start_learning=200

agent network is:

self.input_layer = tf.keras.layers.InputLayer(input_shape=(num_states,))
self.fc1 = tf.keras.layers.Dense(hidden_units, activation = 'relu', kernel_initializer = 'he_uniform')
self.fc2 = tf.keras.layers.Dense(hidden_units, activation = 'relu', kernel_initializer = 'he_uniform')
self.output_layer = tf.keras.layers.Dense(num_actions,name = 'q_values')

loss function is "mse", optimizer is Adam. Could anyone help? Really appreciate it!

Hey, dude, sorry for late reply. I don't know whether that problem has been solved or not, and I never tried that two environments before. It's sometimes really tricky and stochastic for dpn to solve those relatively "easy" environments, which requires subtle parameters setting or even some lucky to make it work. In my opinion, I think you should larger the learning rate or try to use a smaller replay buffer size. You could mail me, I think we could solve it together.

nancyhwr commented 4 years ago

Thank you for the reply. I also posted the same question on reddit and there is one guy also tried to ran DDQN on breakout and pong with ram input and it doesn't work either with his code. So I am suspecting the DDQN itself could ever work with the atari ram input.

For the cartpole, I def will try other parameter setting. I wrote one myself later according to yours, and it worked, but just super slow.

Thank you very much for this help!! If you have any thoughts on training the atari games with ram input version, please let me know. It's so strange for me.

Huixxi commented 4 years ago

Ok, I will~ But in general, to run the cartpole environment shouldn’t be that slow, a relative small network will handle it. And you can also try some policy gradient method like, Actor-critic, PPO etc. And you could also contact me if you have other reinforcement learning question, I will try my best to help.

nancyhwr commented 4 years ago

Thank you so much!!! That means so much!