JuliaFirstOrder / ProximalOperators.jl

Proximal operators for nonsmooth optimization in Julia
Other
131 stars 24 forks source link

Nonnegative least squares example #119

Open caseykneale opened 3 years ago

caseykneale commented 3 years ago

Whipped this together earlier. Not sure on the stopping criterion, but it seems reasonable...

using ProximalOperators
# Define solvers
function NNLS_admm(A, b, x; tol = 1e-9, maxit = 50000, α = 1.0)
    u = zero(x)
    #Constraint dictates that x = z
    z = @view x[:]
    lastx = deepcopy(z)
    f = LeastSquares(A, b)
    g = IndNonnegative()
    for it = 1:maxit
        # perform f-update step
        prox!(x, f, x - z + u, α)
        # perform g-update step
        prox!(z, g, x + u, α)
        # stopping criterion
        if norm(lastx .- x, 2) <= tol
            println(it)
            break
        end
        # dual update
        u .+= x - z
        lastx[:] = z[:]
    end
    return z
end

m, n, k, sig = 200, 50, 100, 1e-3
A = rand(m, n)
x_true = rand(n)
b = A*x_true

x_admm = NNLS_admm(A, b, zeros(n))
println("ADMM")
println("      nnz(x)    = $(norm(x_admm, 0))")
println("      obj value = $(0.5*norm(A*x_admm-b)^2 + lam*norm(x_admm, 1))")

(x_admm .- x_true)

plot(x_admm)
plot!(x_true)
mfalt commented 3 years ago

Just took a quick look, looks like a reasonable example, not sure what we currently have. However, this line z = @view x[:] doesn't make much sense. It creates a copy of x (x[:]) and then assigns z to be a view of that copy. z = copy(x) is essentially equivalent and possibly slightly faster.

If we want to illustrate optimized code we there are a few possible optimizations like allocating a temp variable and tmp .= x .- z .+ u to reduce allocations (can be reused for x + u). At the very least one should change u .+= x - z to u .+= x .- z. The only remaining allocation would then be norm(lastx .- x) which could be solved by a tiny normdiff implementation. Also, generally lastx .= z is neater than lastx[:] = z[:].

We might not care about optimizing the code though, and in that case i think the temp variable and normdiff can be skipped.

Edit: the cost printed doesn't seem to correspond to the algoritm.

caseykneale commented 3 years ago

yea forgot to delete/edit the cost printout.

So the stuff I've read typically denotes X and Z and separate items for the sanctity of ADMM, but - as you mention it's not needed? It could just be a view - think I was playing with it and left the [:] in sorry. I'm not 100% sure on the stopping conditions either, but given the equivalence of X & Z it's either some normdiff on the weights or the residuals I guess?

Feel free to change it up, use it, don't use it :). I just wrote it real quick while trying something much more involved and trying to learn the package :). Figured I'd offer it because it's a nice compliment to the LASSO example.

caseykneale commented 3 years ago

Probably better to use an underdetermined example to show efficacy... Something like...

m, n, k, sig = 200, 300, 100, 1e-3
A = randn(m, n)
x_true = randn(n)
b = A*x_true

x_admm = admm(A, b, zeros(n))
x_ls = inv(A'A)*A'*b

pos(x) = (x > 0.0) ? x : 0.0

sum(abs, x_admm .- x_true)
sum(abs, x_ls .- x_true)
sum(abs, pos.(x_ls) .- x_true)