Closed Moran232 closed 5 years ago
It depends on which learner you are using. For instance, for WeightSparseLearner
, the state is computed in learners/weight_sparsification/rl_helper.py
.
Basically, we follow the AMC paper (He et al., ECCV 2018) with a few modifications in defining the state vector.
Thanks. One more thing, can you explain what is the maskable_vars(list of maskable variables)? In learners/weight_sparsification/rl_helper.py I guess it is the layer?
Which line are you referring to (or, can you post the corresponding code block)?
class RLHelper(object): """Reinforcement learning helper for the weight sparsification learner."""
def init(self, sess, maskable_vars, skip_head_n_tail): """Constructor function.
Args:
* sess: TensorFlow session
* maskable_vars: list of maskable variables
* skip_head_n_tail: whether to skip the head & tail layers
"""
# obtain the shape & # of parameters of each maskable variable
nb_vars = len(maskable_vars)
var_shapes = []
self.prune_ratios = np.zeros(nb_vars)
self.nb_params_full = np.zeros(nb_vars)
for idx, var in enumerate(maskable_vars):
var_shape = sess.run(tf.shape(var))
assert var_shape.size in [2, 4], '# of variable dimensions is %d (invalid)' % var_shape.size
if var_shape.size == 2:
var_shape = np.hstack((np.ones(2), var_shape))
var_shapes += [var_shape]
self.nb_params_full[idx] = np.prod(var_shape)
maskable_vars in weight sparsification
For WeightSparseLearner
, maskable_vars
refers to:
hi, I noticed that you calculate the prune_ratios in one roll-out in the function calc_rlout_actions(self) then pass it to calc_optimal_prune_ratios(self) to calculate the reward.
Does it mean the agent can only get the reward after finishing pruning all the layers?
Above functions are from weight_sparsification/pr_optimizer.py
Yes, the DDPG agent can only get the reward after finishing pruning all the layers. The reward depends on the classification accuracy of the model with all layers pruned with certain pruning ratios.
Thanks for reply As the DDPG agent can only get the reward after finishing pruning all the layers,what is the reward before finishing?Set it to zero?
Reward is not needed during the roll-out. Here, a roll-out refers to the process of determining pruning ratios of all layers.
As you mentioned that the states of DDPG are:
which part of your code define the state vector and pass the state vector to the DDPG? And why you chose these 5 factors as the state?