cvxgrp / aa

Anderson Acceleration
MIT License
19 stars 10 forks source link

required memory #1

Open ldv1 opened 3 years ago

ldv1 commented 3 years ago

Hi,

If I use as in the README

aa_wrk = aa.AndersonAccelerator(dim, mem, type1, eta)

how much memory will be consumed? dim*mem ? twice that number?

thanks for any help

junziz commented 3 years ago

The memory is on the order of O(dim*mem), as you suggested. The implementation actually stores x, f, g, g_prev, y, s, d (each with size dim), Y, S, D, M (each with size dim*mem) and work, ipiv (each with size mem), plus some scalars type1, mem, dim, iter, verbosity, regularization (each with size 1). So the actual memory consumption is indeed 7*dim + 4*dim*mem + 2*mem + 6.

ldv1 commented 3 years ago

Thanks for the prompt answer. I assume this is for Type-1 AA. So, it only works for small to medium problems.

junziz commented 3 years ago

This memory consumption is for both Type-1 and Type-2 AA. It indeed works for relatively large problems (10^6 ~ 10^7 at least, on a laptop) as well. You can check this by playing with SCS (which calls aa.c in this repo). In fact, this memory consumption is not really much larger than the memory consumption of (unaccelerated) solvers. For example, in SCS without any acceleration, the memory consumption is already at least around 15*dim (or more). And typically mem = 5 ~ 10 already works, which leads to a memory consumption of 27*dim + 16 ~ 47*dim + 26. And this is just about 2-3 times the original memory consumption (without acceleration).

And it's indeed not difficult to further optimize the memory consumption by removing some intermediate quantities (like M and/or D) using least-squares formulation (for Type-2 AA, see e.g., this Python implementation) and rank-one updates (for Type-I AA, see e.g., this Matlab implementation). We are indeed also incorporating these improved implementations into this repo (with C implementation) now.

So in general, the algorithm AA is able to work with very large-scale problems (although some really large ones may need improved implementations like those mentioned above), especially when it's compared to the 1.5-order and second-order methods. In particular, when it's compared to classical limited-memory quasi-Newton methods and Newton methods, AA achieves similar and often better convergence speed acceleration with a much lower memory consumption. Hope that these explanations help a bit and feel free to let us know if you have any additional questions :)