Closed dylanrandle closed 4 years ago
It appears that simply changing the learning rate from 0.025 to a more sensible 0.001 fixes the problem.
My conjecture is that with such a high learning rate, the model was jumping around in parameter space, always ending in local minima that do well on the training set, but only sometimes ones that generalize and perform well on validation.
Figure out which parameters to tune to make DARTS work (base dataset: MNIST/FashionMNIST)