D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.
MIT License
1.57k stars 282 forks source link

Why not keep the gumbel softmax trick during the retraining stage? #91

Closed d12306 closed 3 years ago

d12306 commented 3 years ago

Hello, @D-X-Y , thanks for your implementation, I noticed that NAS papers that employ the Gumbel-softmax trick in the searching stage do not keep the same sampling procedure in the evaluation stage. Why do you keep such an inconsistency between training and evaluation?

Does adding the Gumbel sampling do any bad to the evaluation performance?

Thanks,

D-X-Y commented 3 years ago

Would you mind providing more details? What do you mean by "adding Gumbel sampling in the evaluation stage"? and which paper are you referring to?

d12306 commented 3 years ago

@D-X-Y ,I am referring paper such as GDAS, by saying "I noticed that NAS papers that employ the Gumbel-softmax trick in the searching stage do not keep the same sampling procedure in the evaluation stage.", I mean during the search, you sample from the Gumbel-softmax distribution and obtain the weight for different operations. And once you finish training, you use the latest weights of ops and get the architecture (finding the two most probable ops for each edge), however, Gumbel-softmax can be influenced by the random noise so the weights of ops are not deterministic given fixed logits.

That being said, the architecture can be quite different if we repeatly sample from the gumbel-softmax distribution, what is the point that we always use one of them? (which can be regarded as a point estimate)

Thanks and please correct me if anything is wrong.

D-X-Y commented 3 years ago

@d12306 Sorry for the late response and thanks for the clarification. The goal of differentiable is to learn the distribution of architectures (defined by a set of variables $alpha$). GDAS uses Gumbel-softmax to update this distribution. After searching, as this distribution has been optimized, we regard the op with the highest probability is the best.

Although the weights of ops created by Gumbel is influenced by the random noise, the raw logits are deterministic?

tehseenmayar commented 3 years ago

Hi, i am sorry for asking such question because i am new to NAS and have been trying perform grid search over the search space of NAS Bench-201 but couldn't find a way so far. can you give me a head start about it? I would appreciate any help

D-X-Y commented 3 years ago

@tehseenmayar Thanks for your interest.

We have extended our NAS-Bench-201 to NATS-Bench, which has more architecture information and a more efficient and robust API. I would recommend you use NATS-Bench instead of NAS-Bench-201. If you want to start with

tehseenmayar commented 3 years ago

Hi, Thanks for your great work. I have run the following command and got the results. python ./exps/NATS-algos/random_wo_share.py --dataset cifar100 --search_space sss can you tell me how to get the final discovered architecture? I would appreciate your help. Thanks

D-X-Y commented 3 years ago

image

Can you see something like this in the saved log file? "The best arch is xxx", where "xxx" is the final discovered architecture

tehseenmayar commented 3 years ago
Screenshot 2021-01-29 at 12 55 25 PM

Thank you for your response. I am looking at these kinds of information. How to get the informations which you have mentioned above. I need to train the found architecture for accuracy validation.

tehseenmayar commented 3 years ago

sorry to bother you again, can you also tell me about how can i retrain the final discovered architecture to find the validation accuracy of that architecture?

thank you

D-X-Y commented 3 years ago

If you are using our NAS-Bench-201 or NATS-Bench (https://xuanyidong.com/assets/projects/NATS-Bench), you do not need to re-train the model. You can directly query the performance of your discovered information via our API.