[ ] Cutkosky, Ashok. 2020. “Better Full-Matrix Regret via Parameter-Free Online Learning.” Advances in Neural Information Processing Systems 33.
Parameter-free Optimization for Deep Learning
[X] Johnson, Tyler, Pulkit Agrawal, Haijie Gu, and Carlos Guestrin. 2020. “AdaScale SGD: A User-Friendly Algorithm for Distributed Training.” In International Conference on Machine Learning, 4911–20. PMLR.
[ ] Cutkosky, Ashok. 2020. “Parameter-Free, Dynamic, and Strongly-Adaptive Online Learning.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:2250–59. Proceedings of Machine Learning Research. Virtual: PMLR.
[ ] A. Cutkosky and T. Sarlos. “Matrix-Free Preconditioning in Online Learning”. In: Proc. of International Conference on Machine Learning. 2019
[X] F. Orabona and T. Tommasi. “Training Deep Networks without Learning Rates Through Coin Betting”. In: Advances in Neural Information Processing Systems 30. 2017
[ ] A. Cutkosky and K. A. Boahen. “Online Convex Optimization with Unconstrained Domains and Losses”. In: Advances in Neural Information Processing Systems 29. 2016, pp. 748–756
Parameter-free Learning with Experts
[ ] N. J. A. Harvey, C. Liaw, E. Perkins, and S. Randhawa. “Optimal anytime regret with two experts”. In: arXiv:2002.08994. 2020
[ ] T. Koren and R. Livni. “Affine-Invariant Online Optimization and the Low-rank Experts Problem”. In: Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 2017, pp. 4747–4755
[ ] K.-S. Jun, F. Orabona, S. Wright, and R. Willett. “Online Learning for Changing Environments Using Coin Betting”. In: Electron. J. Statist. 11.2 (2017), pp. 5282–5310
[ ] D. J. Foster, A. Rakhlin, and K. Sridharan. “Adaptive Online Learning”. In: Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 2015, pp. 3375–3383
[ ] W. M. Koolen and T. van Erven. “Second-order Quantile Methods for Experts and Combinatorial Games”. In: Proc. of COLT. 2015, pp. 1155–1175
[X] H. Luo and R. E. Schapire. “Achieving All with No Parameters: AdaNormalHedge”. In: Proc. of COLT. 2015, pp. 1286–1304
[ ] H. Luo and R. E. Schapire. “A Drifting-Games Analysis for Online Learning and Applications to Boosting”. In: Advances in Neural Information Processing Systems. 2014
[ ] A. Chernov and V. Vovk. “Prediction with Advice of Unknown Number of Experts”. In: Proc. of the 26th Conf. on Uncertainty in Artificial Intelligence. AUAI Press, 2010
[ ] K. Chaudhuri, Y. Freund, and D. J. Hsu. “A Parameter-Free Hedging Algorithm”. In: Advances in neural information processing systems. 2009, pp. 297–305
Optimization Heuristics Related to Parameter-free Algorithms
[X] Hoffer, Elad, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, and Daniel Soudry. 2020. “Augment Your Batch: Improving Generalization Through Instance Repetition.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8129–38. openaccess.thecvf.com.
[X] You, Yang, Yuhui Wang, Huan Zhang, Zhao Zhang, James Demmel, and Cho-Jui Hsieh. 2020. “The Limit of the Batch Size.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2006.08517.
[X] J. Bernstein, A. Vahdat, Y. Yue, and M.-Y. Liu. “On the distance between two neural networks and the stability of learning”. In: arXiv:2002.03432. 2020
[X] Y. You, Z. Zhang, C.-J. Hsieh, J. Demmel, and K. Keutzer. “ImageNet training in minutes”. In: Proc. of the 47th International Conference on Parallel Processing. 2018
[X] Y. You, I. Gitman, and B. Ginsburg. “Scaling SGD batch size to 32K for Imagenet training”. Technical Report UCB/EECS-2017-156, University of California, Berkeley, 2017
Stochastic Optimization on Riemannian Manifolds
[ ] Sato, Hiroyuki, Hiroyuki Kasai, and Bamdev Mishra. 2019. “Riemannian Stochastic Variance Reduced Gradient Algorithm with Retraction and Vector Transport.” SIAM Journal on Optimization: A Publication of the Society for Industrial and Applied Mathematics 29 (2): 1444–72.
[ ] Fong, Robert Simon, and Peter Tino. 2019. “Extended Stochastic Derivative-Free Optimization on Riemannian Manifolds.” In Proceedings of the Genetic and Evolutionary Computation Conference Companion, 257–58. GECCO ’19. New York, NY, USA: Association for Computing Machinery.
[ ] Zhou, Pan, Xiaotong Yuan, Shuicheng Yan, and Jiashi Feng. 2019. “Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds.” IEEE Transactions on Pattern Analysis and Machine Intelligence PP (August). https://doi.org/10.1109/TPAMI.2019.2933841.
[ ] Zhang, Jingzhao, Hongyi Zhang, and Suvrit Sra. 2018. “R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate.” arXiv [math.OC]. arXiv. http://arxiv.org/abs/1811.04194.
[ ] Tripuraneni, Nilesh, Nicolas Flammarion, Francis Bach, and Michael I. Jordan. 2018. “Averaging Stochastic Gradient Descent on Riemannian Manifolds.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1802.09128.
[ ] Liu, Yuanyuan, Fanhua Shang, James Cheng, Hong Cheng, and Licheng Jiao. 2017. “Accelerated First-Order Methods for Geodesically Convex Optimization on Riemannian Manifolds.” In Advances in Neural Information Processing Systems, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 30:4868–77. Curran Associates, Inc.
[X] Zhang, Hongyi, Sashank J. Reddi, and Suvrit Sra. 2016. “Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds.” In Advances in Neural Information Processing Systems, edited by D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, 29:4592–4600. Curran Associates, Inc.
[ ] Zhang, Hongyi, and Suvrit Sra. 2016. “First-Order Methods for Geodesically Convex Optimization.” In Conference on Learning Theory, 1617–38. PMLR.
[ ] Udriste, C. 2013. Convex Functions and Optimization Methods on Riemannian Manifolds. Springer Science & Business Media.
[X] Bonnabel, S. 2013. “Stochastic Gradient Descent on Riemannian Manifolds.” IEEE Transactions on Automatic Control 58 (9): 2217–29.
[ ] Absil, P-A, R. Mahony, and Rodolphe Sepulchre. 2009. Optimization Algorithms on Matrix Manifolds. Princeton University Press.
Meta-Algorithm for Stochastic Optimization
[X] Diakonikolas, Ilias, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. 2019. “Sever: A Robust Meta-Algorithm for Stochastic Optimization.” In Proceedings of the 36th International Conference on Machine Learning, edited by Kamalika Chaudhuri and Ruslan Salakhutdinov, 97:1596–1606. Proceedings of Machine Learning Research. Long Beach, California, USA: PMLR.
[ ] Eftimov, Tome, and Peter Korošec. 2019. “Identifying Practical Significance through Statistical Comparison of Meta-Heuristic Stochastic Optimization Algorithms.” Applied Soft Computing 85 (December): 105862.
Optimizers for Deep Neural Networks
[X] Shah, Vatsal, Xiaoxia Wu, and Sujay Sanghavi. 2020. “Choosing the Sample with Lowest Loss Makes SGD Robust.” arXiv [stat.ML]. arXiv. http://arxiv.org/abs/2001.03316.
[X] Li, Mingchen, Mahdi Soltanolkotabi, and Samet Oymak. 2020. “Gradient Descent with Early Stopping Is Provably Robust to Label Noise for Overparameterized Neural Networks.” In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, edited by Silvia Chiappa and Roberto Calandra, 108:4313–24. Proceedings of Machine Learning Research. Online: PMLR.
[X] Zhuang, Juntang, Tommy Tang, Yifan Ding, Sekhar C. Tatikonda, Nicha Dvornek, Xenophon Papademetris, and James Duncan. 2020. “Adabelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients.” Advances in Neural Information Processing Systems 33. https://papers.nips.cc/paper/2020/file/d9d4f495e875a2e075a1a4a6e1b9770f-Paper.pdf.
[X] Qian, Qian, and Xiaoyuan Qian. 2019. “The Implicit Bias of Adagrad on Separable Data.” Advances in Neural Information Processing Systems 32: 7761–69.
[X] Zou, Fangyu, Li Shen, Zequn Jie, Weizhong Zhang, and Wei Liu. 2019. “A Sufficient Condition for Convergences of Adam and Rmsprop.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 11127–35. openaccess.thecvf.com.
[ ] Zhang, Z. 2018. “Improved Adam Optimizer for Deep Neural Networks.” In 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), 1–2. ieeexplore.ieee.org.
[ ] Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of Adam and beyond. In International Conference on Learning Representations, 2018.
[ ] Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, and Quanquan Gu. On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671, 2018.
[ ] Kingma, D. and Ba, J. (2015) Adam A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015).