Closed CUN-bjy closed 3 years ago
experiment specifications hyperparameter:
network
actor:
# input layer(observations)
input_ = Input(shape=self.obs_dim)
# hidden layer 1
h1_ = Dense(24,kernel_initializer=GlorotNormal())(input_)
h1_b = BatchNormalization()(h1_)
h1 = Activation('relu')(h1_b)
# hidden_layer 2
h2_ = Dense(16,kernel_initializer=GlorotNormal())(h1)
h2_b = BatchNormalization()(h2_)
h2 = Activation('relu')(h2_b)
# output layer(actions)
output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2)
output_b = BatchNormalization()(output_)
output = Activation('tanh')(output_b)
scalar = self.act_range * np.ones(self.act_dim)
out = Lambda(lambda i: i * scalar)(output)
critic
# input layer(observations and actions)
input_obs = Input(shape=self.obs_dim)
input_act = Input(shape=(self.act_dim,))
inputs = [input_obs,input_act]
concat = Concatenate(axis=-1)(inputs)
# hidden layer 1
h1_ = Dense(24, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat)
h1_b = BatchNormalization()(h1_)
h1 = Activation('relu')(h1_b)
# hidden_layer 2
h2_ = Dense(16, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1)
h2_b = BatchNormalization()(h2_)
h2 = Activation('relu')(h2_b)
# output layer(actions)
output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2)
output_b = BatchNormalization()(output_)
output = Activation('linear')(output_b)
training time : about 5 hours(2500 epi) on intel i7 cpu
reinforcement learning in
RoboschoolInvertedPendulum-v1
environment