Question about Gumbel-softmax in GDAS

xingyueye commented 4 years ago

@D-X-Y Hi, I noticed the implementation of Gumbel-softmax in GDAS: gumbels = -torch.emptylike(xins).exponential().log() logits = (xins.log_softmax(dim=1) + gumbels) / self.tau It seems to be different from the original definition of Gumbel， like that: x_out = argmax(log(x_in) + G) G = -log(-log(U)), U ~U(0, 1) Can you tell me what is the special purpose of your changes, THANKS a lot

D-X-Y commented 4 years ago

I think they are the same but with different notations?

"log(x_in)" = "xins.log_softmax(dim=1)"
self.tau is temperature

xingyueye commented 4 years ago

@D-X-Y Thanks for your reply. I have done some experiments, both methods can get ideal sample sequence. By the way, a fixed setting of Reduction_cell is mentioned in the GDAS paper, but I did not find it in this project. What is the reason for this, is the effect poor after using a fixed Reduction_cell? My own implementation results show that the network performance is lower than the original darts

D-X-Y commented 4 years ago

Thanks for your question. I just uploaded the implementation of reduction_cell at here: https://github.com/D-X-Y/AutoDL-Projects/blob/master/lib/models/cell_operations.py#L240 I did not include it in the experiments of this repo, because to purely compare the searching algorithm we need to fix all aspects except for the searching algorithm itself. and I hope to make an apple-to-apple comparison in this repo.

For the performance, would you mind to share your log and the structure of the discovered architecture? There might be two reasons.

GDAS has a certain variance.
Did you run the GDAS searching 4 times and pick up the best? (This follows the setup in DARTS).

xingyueye commented 4 years ago

@D-X-Y Thanks for your reply. In fact, I also want to make an apple-to-apple comparison. In the current experiment, I only added a fixed reduce_cell(I think it would be helpful to give a good fixed reduction_cell too）. Then I trained the following three structures: the original darts_v2, darts_v2 normal + Fixed reduction_cell, new searched architecture using fixed reduction_cell . The results on cifar-10 show that the accuracy gradually decreases, and the decline is about 0.4 points (this is already relatively large). The structure of my new search is as follows: DARTS_V3 = Genotype(normal=[('max_pool_3x3', 0), ('avg_pool_3x3', 1), ('sep_conv_3x3', 0), ('dil_conv_3x3', 1), ('dil_conv_3x3', 1), ('sep_conv_3x3', 3), ('sep_conv_3x3', 4), ('dil_conv_5x5', 0)], normal_concat=range(2, 6), reduce=[('avg_pool_3x3', 1), ('avg_pool_3x3', 0), ('sep_conv_5x5', 2), ('skip_connect', 0), ('skip_connect', 1), ('max_pool_3x3', 3), ('sep_conv_3x3', 3), ('avg_pool_3x3', 1)], reduce_concat=range(2, 6)) Limited by the GPU resources, I have not conduct a set of four experiments according to different seeds, but the large accuracy gap in the above results should already indicate that my current experiment has failed. Can you give me some advice, thanks

xingyueye commented 4 years ago

@D-X-Y I just read your implementation of reduction_cell, I noticed that your implementation is different from the arch mentioned in your paper: X2 = self.ops2[0] (X0+X1) X3 = self.ops2[1] (s1) I'm confused about that

D-X-Y commented 4 years ago

@D-X-Y I just read your implementation of reduction_cell, I noticed that your implementation is different from the arch mentioned in your paper: X2 = self.ops2[0] (X0+X1) X3 = self.ops2[1] (s1) I'm confused about that

Thanks for pointing this issue. I have fixed it in https://github.com/D-X-Y/AutoDL-Projects/blob/master/lib/models/cell_operations.py#L288. I tried some reduction cells with minor differences and pick up an old one.... Sorry for condusion.

D-X-Y commented 4 years ago

@xingyueye Thanks for your results. May I ask does DARTS_V3 = GDAS?

xingyueye commented 4 years ago

@xingyueye Thanks for your results. May I ask does DARTS_V3 = GDAS? No，Darts_v3 is the new searched architecture with fixed reduction_cell (Ignore the reduce part, it is not used). The search progress still use the softmax addition method of the original darts instead of Gumbel_softmax sampling. I want to get the effect of fixing Reduction_cell on search

D-X-Y commented 4 years ago

I see, when I try reduction cell. I would prefer to replace it with the reduction cell in other networks, and check the re-training performance. instead of searching?

xingyueye commented 4 years ago

@D-X-Y Um ,,,,, I can’t understand what you mean. Do you mean to keep the normal_cell unchanged and only change the reduction_cell and train the network to verify the performance with different reduction_cell? Regarding the use of fixed reduction_cell in the search process, I want to know, will it affect the stability of search? Or can we get a better arch(with high acc)，because it reduces the difficulty of searching?

D-X-Y commented 4 years ago

Yes. Using a fixed reduction cell should make searching easier. It might not help get an architecture with higher accuracy. I would expect it to obtain an architecture with similar accuracy compared to searching both kinds of cells. The fixed reduction cell in GDAS aims to reduce the number of parameters of the searched network. As in our experiments, the searched reduction cells are sometimes unnecessary large.

xingyueye commented 4 years ago

yes, I also feel that such a large reduction_cell is unnecessary, especially most of them are pooling and skip_connect. Did your experiments show accuracy diffence after replacing the searched Reduction_cell with the fixed cell mentioned in paper? Add, thank you very much for your quick reply

D-X-Y commented 4 years ago

You can have a look at Table 5 in my paper, which shows the results of replacing it.

D-X-Y commented 4 years ago

Feel free to reopen it.

D-X-Y / AutoDL-Projects

Question about Gumbel-softmax in GDAS #54