Closed andrewwmao closed 1 year ago
I made those changes to remove allocations. From my end, this should be good to go. Thanks, Andrew!
It's great to see this implemented with "in place" operations. We have a version here that does not have those memory savings: https://github.com/JeffFessler/MIRT.jl/blob/main/src/algorithm/general/pogm_restart.jl (It might be wise to check that the two versions give the same results.) A former student @cecroc started an in-place version but I think it is still a WIP.
In terms of memory, there is an alternative to POGM that just appeared earlier this week (!) that has the same rate: http://arxiv.org/abs/2305.15704 The math looks slightly simpler, but I have not studied it carefully enough to see if it reduces the number of auxiliary variables that one must store.
Very interesting reference! At a quick glance it looks like it would require the same # of intermediate variables. But is it correct to say that this guarantees a 2x better worst-case rate than FISTA, whereas POGM does not? I may have a look to see if I can implement this as well.
2x better worst-case rate than FISTA, whereas POGM does not?
POGM has the same 2x factor by a "computer assisted" numerical proof. OptISTA has an analytical proof. In both cases the 2x is "worst case" bound so it can be interesting to compare them for any given practical problem.
I implemented OptISTA as well and in my experience so far in our 2D/3D radial data, the performance is virtually identical to that of POGM! I did manage to reduce 1 additional auxiliary variable, and the code is indeed simpler compared to POGM.
Terrific!
I did manage to reduce 1 additional auxiliary variable Did that reduction apply to both POGM and OptISTA or just one of them?
That is with respect to POGM, so still 2 more compared to FISTA. The extra variable in my POGM implementation comes from the gradient restart part, which isn't used in OptFISTA, so they are actually comparable.
One thing worth considering is if we want to create two separate struct
for FISTA with and w/o gradient restart (same for POGM). This would save some memory in the latter case, but make the code a bit more ugly.
This all looks very good. Thank you for the contribution @andrewwmao!
I will merge this now and more optimization can be made in new PR. @andrewwmao: If you don't have commit rights I will grant you that.
One thing worth considering is if we want to create two separate struct for FISTA with and w/o gradient restart (same for POGM). This would save some memory in the latter case, but make the code a bit more ugly.
If possible it is better to keep one struct. It is no problem to initialize unused vectors with length 0 which will then have 0 overhead. An alternative is to use a parametric type and make tmp::Union{Nothing, Vector{T}}
.
I don't believe I currently have commit rights -- thanks! I will look into these further optimizations when I have a chance.
Hi @tknopp @migrosser @JakobAsslaender @JeffFessler, I implemented the POGM method originally described here and the adaptive restart with decaying momentum version described here. I put these references in the comments.