NeroLoh / darts-tensorflow

Tensorflow code for Differentiable architecture search
73 stars 18 forks source link

it may be a 1st-order implement #1

Closed MarkAlive closed 5 years ago

MarkAlive commented 5 years ago

train_grads_pos and train_grads_neg in func compute_unrolled_step are equal due to the unchanged train_loss, which finally leads to a 1st-order implement.

https://github.com/NeroLoh/darts-tensorflow/blob/7fd74eb40421a0ebaf86e72d489fdc749e38321a/cnn/train_search.py#L155

NeroLoh commented 5 years ago

train_grads_pos and train_grads_neg in func compute_unrolled_step are equal due to the unchanged train_loss, which finally leads to a 1st-order implement.

darts-tensorflow/cnn/train_search.py

Line 155 in 7fd74eb

train_grads_pos=tf.gradients(train_loss,arch_var)

Thank you for pointing out this problem.
I fix this using tf.control_dependency to control the flow: with tf.control_dependencies([v+R*g for v,g in zip(w_var,valid_grads)]): train_grads_pos=tf.gradients(train_loss,arch_var)
I test it on the newly added file "cnn/debug_unrolled_step" and it works. This change make the training process become a lot more slower, the implemention can be improve.

MarkAlive commented 5 years ago

Thanks for the quick respond. I updated your code and ran the “debug_unrolled_step.py” for recheck. It seems to be working(the output of arch_grad_after and arch_grad_before is different). However, when I changed the code as follow, the output of arch_grad_before and arch_grad_before1 are also different. As shown in follow, I also further check the w_var and the result shown that w_var_b and w_var_a are equal. Could you please explain this phenomenon, thanks a lot!

` arch_grad_before=tf.gradients(train_loss,arch_var) arch_grad_before1=tf.gradients(train_loss,arch_var) w_var_b = utils.get_var(tf.trainable_variables(), 'lw')[1] with tf.control_dependencies([v+R*g for v,g in zip(w_var,valid_grads)]): w_var_a = utils.get_var(tf.trainable_variables(), 'lw')[1] arch_grad_after=tf.gradients(train_loss,arch_var)

config = tf.ConfigProto()
os.environ["CUDA_VISIBLE_DEVICES"] = str(0)
config.gpu_options.allow_growth = True
sess=tf.Session(config=config)

sess.run(tf.global_variables_initializer())
sess.run([train_iter.initializer])
print(sess.run(arch_grad_before)[0])
print(sess.run(arch_grad_before1)[0])
print(sess.run(arch_grad_after)[0])
print(sess.run(w_var_b)[0])
print(sess.run(w_var_a)[0])

`

NeroLoh commented 5 years ago

Thanks for your comment. I have update my code to fix these problem, please check. The code uploaded at the last time still fail to update the w_var using
with tf.control_dependencies([v+R*g for v,g in zip(w_var,valid_grads)]):

because the operation v+R*g is invalid. So , I modify it as 'v.assign(v+Rg)'
`with tf.control_dependencies([v+R
g for v,g in zip(w_var,valid_grads)]):`
This finanly ensures the update for w_var.

The different result runing between 'arch_grad_before' and 'arch_grad_before1' result from the automatically iterating input, which means that each time you implicitly call train_loss , the input will be different. To address this ,you can either fixed the input using a certain input image or call the two operation at the same run, like
(sess.run([arch_grad_before,arch_grad_before1]).

Besides, you should put sess.run(w_var_a) before sess.run(arch_grad_after) because when the later one is called, the w_var will be updated for the contol dependency condition, so the valued of w_var_a will be changed too.
Please checkout the debug_unrolled_step.py to test the result.

NeroLoh commented 5 years ago

I have changed the implementation code of the 2-nd order gradient, which utilize the tensorflow optimizer to update the w_var. This help to speed up the training pcrocess. You can check the commit history for the code disscussed before.

MarkAlive commented 5 years ago

I have several other questions about function compute_unrolled_step:

  1. Why don't you recompute train_loss after arch_var is updated, just like pytorch version? Is there any implicit mechanism?

  2. As discripted in chapter 2.3 of paper DARTS, w+ = w + R*valid_grads(w'). But in your code, is seems to be w+ = w' + R*valid_grads(w'). Here is the corresponding issue

  3. leader_grads is computed from loss of train data in your code. But in chapter 2.3 of paper DARTS, it is computed from valid data.

Thanks!

NeroLoh commented 5 years ago

Thanks for you comment.

  1. It is not neccessary to do that becasuse the train_loss has hold the references of w_var, once the w_var is updated, the train_loss will be updated too. You can just ran a demo to test this.
  2. I changed the implementation of the unrolled step computation to fix this, please check.
  3. It is a mistake and I fixed it at the updated code, please check.
MarkAlive commented 5 years ago

They all work, 666!