Closed MarkAlive closed 5 years ago
train_grads_pos and train_grads_neg in func compute_unrolled_step are equal due to the unchanged train_loss, which finally leads to a 1st-order implement.
darts-tensorflow/cnn/train_search.py
Line 155 in 7fd74eb
train_grads_pos=tf.gradients(train_loss,arch_var)
Thank you for pointing out this problem.
I fix this using tf.control_dependency to control the flow:
with tf.control_dependencies([v+R*g for v,g in zip(w_var,valid_grads)]): train_grads_pos=tf.gradients(train_loss,arch_var)
I test it on the newly added file "cnn/debug_unrolled_step" and it works.
This change make the training process become a lot more slower, the implemention can be improve.
Thanks for the quick respond. I updated your code and ran the “debug_unrolled_step.py” for recheck. It seems to be working(the output of arch_grad_after and arch_grad_before is different). However, when I changed the code as follow, the output of arch_grad_before and arch_grad_before1 are also different. As shown in follow, I also further check the w_var and the result shown that w_var_b and w_var_a are equal. Could you please explain this phenomenon, thanks a lot!
` arch_grad_before=tf.gradients(train_loss,arch_var) arch_grad_before1=tf.gradients(train_loss,arch_var) w_var_b = utils.get_var(tf.trainable_variables(), 'lw')[1] with tf.control_dependencies([v+R*g for v,g in zip(w_var,valid_grads)]): w_var_a = utils.get_var(tf.trainable_variables(), 'lw')[1] arch_grad_after=tf.gradients(train_loss,arch_var)
config = tf.ConfigProto()
os.environ["CUDA_VISIBLE_DEVICES"] = str(0)
config.gpu_options.allow_growth = True
sess=tf.Session(config=config)
sess.run(tf.global_variables_initializer())
sess.run([train_iter.initializer])
print(sess.run(arch_grad_before)[0])
print(sess.run(arch_grad_before1)[0])
print(sess.run(arch_grad_after)[0])
print(sess.run(w_var_b)[0])
print(sess.run(w_var_a)[0])
`
Thanks for your comment. I have update my code to fix these problem, please check.
The code uploaded at the last time still fail to update the w_var using
with tf.control_dependencies([v+R*g for v,g in zip(w_var,valid_grads)]):
because the operation v+R*g
is invalid. So , I modify it as 'v.assign(v+Rg)'
`with tf.control_dependencies([v+Rg for v,g in zip(w_var,valid_grads)]):`
This finanly ensures the update for w_var.
The different result runing between 'arch_grad_before' and 'arch_grad_before1' result from the automatically iterating input, which means that each time you implicitly call train_loss , the input will be different. To address this ,you can either fixed the input using a certain input image or call the two
operation at the same run, like
(sess.run([arch_grad_before,arch_grad_before1])
.
Besides, you should put sess.run(w_var_a)
before sess.run(arch_grad_after)
because when the later one is called, the w_var will be updated for the contol dependency condition, so the valued of w_var_a will be changed too.
Please checkout the debug_unrolled_step.py to test the result.
I have changed the implementation code of the 2-nd order gradient, which utilize the tensorflow optimizer to update the w_var. This help to speed up the training pcrocess. You can check the commit history for the code disscussed before.
I have several other questions about function compute_unrolled_step:
Why don't you recompute train_loss
after arch_var
is updated, just like pytorch version? Is there any implicit mechanism?
As discripted in chapter 2.3 of paper DARTS, w+ = w + R*valid_grads(w')
. But in your code, is seems to be w+ = w' + R*valid_grads(w')
. Here is the corresponding issue
leader_grads
is computed from loss of train data in your code. But in chapter 2.3 of paper DARTS, it is computed from valid data.
Thanks!
Thanks for you comment.
They all work, 666!
train_grads_pos and train_grads_neg in func compute_unrolled_step are equal due to the unchanged train_loss, which finally leads to a 1st-order implement.
https://github.com/NeroLoh/darts-tensorflow/blob/7fd74eb40421a0ebaf86e72d489fdc749e38321a/cnn/train_search.py#L155