Questions about GDAS sampling process

Which Algorithm GDAS

Describe the Question It's not a bug report. I just want to understand some details about the GDAS algorithm.

I am curious about the sampling process in each iteration. From my understanding, it seems that in GDAS, operations are sampled twice: once for optimizing the network weights and again for optimizing the architecture weights (as the below pseudo-code). I noticed this in this repository, but I wanted to verify if my understanding is correct.

while not converge:
    // train network weight
    sample batches from D_t
    sample one operation o1
    forward via operation o1
    update sub-network's weight by gradient descent

    // train architecture weight
    sample batches from D_v
    sample another operation o2
    forward via operation o2
    update architecture weight by gradient descent

Additionally, if the description above is correct, I wanted to inquire about the reasoning behind this approach. In my opinion, after training the weights of the operation being sampled, it seems more reasonable to adjust its architecture weights rather than sampling another operation and adjusting its corresponding architecture weights (use the same operation o1 for both training network weight and architecture weight in a single iteration). I would appreciate it if you could provide some insights or clarifications on this aspect.

Thanks for your time and consideration.

D-X-Y / AutoDL-Projects

Questions about GDAS sampling process #133