Closed tamltlkdn closed 1 year ago
Hi,
Thanks for bringing this to our attention! First, please make sure the no-op operation (none in spaces.py, line 81) is also among the candidate operation during search. Eliminating the none operation from candidates is a modification that SDARTS and DARTS-PT made for some reason, and it usually results in deep architectures without any skip-connection, which performs badly. We used epsilon=0.0001 and lambda=0.25 for the first regularization term ($\Lambda$) with a batch size of 64 in the original DARTS search space. This setting gives you 2 to 3 skip-connections on average with a modest depth. For the other regularization term ($\Lambda_\pm$) we used epsilon=0.0001 and lambda=0.125 with the same batch size, which gives 1 to 2 skip-connections on average with deeper architectures. Also, you can do some other things to reduce the variance of the discovered architectures during search to some degree, which was not mentioned in the paper: 1) reduce the architecture learning rate to 0.0001 2) if you have more GPU memory to spare, increase the batch size to 96 and increase the epsilon to 0.001 or 0.01 3) coupling $\Lambda$-DARTS with SDARTS-RS (i.e. adding some noise to the architecture parameters).
Please let us know if you face any more issues.
Hi,
Thanks for the question again. Since it's been a while and nothing else appears to be coming up, I'm closing this issue. Feel free to reopen or start another issue if you have further questions/comments.
Hello authors. I am reproducing the result from table 2 on DARTS search space. Which epsilon value are you using? Is it 0.0001? Using 0.0001, the cell usually has more than 2 max_pool_3x3 (8 independent runs), and the test accuracy is low (96.xx).