In Algorithm 1, "Learn theta and Fine-tune M with (L^S)_p and L_f." which means that the weights of the first layer to layer L_p will be updated w.r.t to (L^S)_p and whole model weight will be updated w.r.t L_f (the final loss), is that right?
In this line, I can understand that i != index is going to update weights of the first layer to layer L_p by self.seg_optimizer[i].step() but the i != len(self.pruned_segments) - 1 is going to update weights of the first layer to the end of the second-last segment.
According to Algorithm 1, it should be the last segment, that is, the whole model, but why the second-last segment?
In Algorithm 1, "Learn theta and Fine-tune M with (L^S)_p and L_f." which means that the weights of the first layer to layer L_p will be updated w.r.t to (L^S)_p and whole model weight will be updated w.r.t L_f (the final loss), is that right?
In this line, I can understand that
i != index
is going to update weights of the first layer to layer L_p byself.seg_optimizer[i].step()
but thei != len(self.pruned_segments) - 1
is going to update weights of the first layer to the end of the second-last segment.According to Algorithm 1, it should be the last segment, that is, the whole model, but why the second-last segment?