SciML / LinearSolve.jl

LinearSolve.jl: High-Performance Unified Interface for Linear Solvers in Julia. Easily switch between factorization and Krylov methods, add preconditioners, and all in one interface.
https://docs.sciml.ai/LinearSolve/stable/
Other
245 stars 52 forks source link

Extra memory usage that cannot be released from GC when reuse_symbolic=true #233

Open ytdHuang opened 1 year ago

ytdHuang commented 1 year ago

Hi,

I am now designing a package that needs to solve multiple times of A \ b, but with different As. This package really benefits a lot, thanks for the hard work of the develop team.

However, I faced some memory issue when using the cache interface. Let me simplify my problem and give an example below.

I have a function called DOS which needs a sparse matrix $M$, two vectors ( $b+$ and $b-$ ), and a list of $\omega$. The linear solve for this problem is to solve $x$ which satisfies $$\left(M \pm i \omega I \right)x = b_\pm$$ for different $\omega$. Here, $I$ is an identity matrix.

The definition of DOS is given below:

using  LinearSolve, SparseArrays, LinearAlgebra
import LinearSolve: set_A

function DOS(
        M::SparseMatrixCSC{ComplexF64, Int64}, 
        bp::Vector{ComplexF64},
        bm::Vector{ComplexF64},
        ωlist;
        solver,
        SOLVEROptions...
    )

    dim, = size(M)
    II = sparse(I, dim, dim)  # identity matrix
    result = Vector{Float64}(undef, length(ωlist)) # a vector which is used to store the results

    # solve for the first ω
    j = 1
    iωI = 1im * ωlist[j] * II

    ## (M - iωI) \ bm
    prob_m = init(LinearProblem(M - iωI, bm), solver, SOLVEROptions...)
    sol_m = solve(prob_m)

    ## (M + iωI) \ bp
    prob_p = init(LinearProblem(M + iωI, bp), solver, SOLVEROptions...)
    sol_p = solve(prob_p)

    result[j] = real(sum(sol_p.u) - sum(sol_m.u))

    # solve for the rest ω
    @inbounds for ω in ωlist[2:end]
        j += 1
        iωI = 1im * ω * II

        # solve (M - iωI) \ bm
        # if the sparsity doesn't change, use the cache from previous solution
        # else, initialize a new linear problem
        try 
            sol_m = solve(set_A(sol_m.cache, M - iωI))
        catch e
            if isa(e, ArgumentError)
                prob_m = init(LinearProblem(M - iωI, bm), solver, SOLVEROptions...)
                sol_m  = solve(prob_m)
            else
                throw(e)
            end
        end

        # solve (M + iωI) \ bp
        # if the sparsity doesn't change, use the cache from previous solution
        # else, initialize a new linear problem
        try
            sol_p = solve(set_A(sol_p.cache, M + iωI))
        catch e
            if isa(e, ArgumentError)
                prob_p = init(LinearProblem(M + iωI, b_p), solver, SOLVEROptions...)
                sol_p = solve(prob_p)
            else
                throw(e)
            end
        end
        result[j] = real(sum(sol_p.u) - sum(sol_m.u))
    end

    return result
end

The reason why I'm using the exception handling is because the sparsity pattern might be different when the $\omega$ changes (especially for $\omega = 0$).

Now, the memory issue pops out in the following testing:

julia> dim = 10000;
julia> density = 0.005;
julia> M  = sprand(ComplexF64, dim, dim, density);
julia> bp = rand(ComplexF64, dim);
julia> bm = rand(ComplexF64, dim);
julia> ωlist = -2:1:2; # [-2, -1, 0, 1, 2];

julia> # memory usage: 631 MB
julia> @time result = DOS(M, bp, bm, ωlist; solver=UMFPACKFactorization(;reuse_symbolic=false));
211.988220 seconds (5.71 M allocations: 62.507 GiB, 0.70% gc time, 2.70% compilation time)
julia> # memory usage: 13.02 GB
julia> GC.gc()
julia> # memory usage: 862 MB
julia> @time result = DOS(M, bp, bm, ωlist; solver=UMFPACKFactorization(;reuse_symbolic=true));
188.753707 seconds (876 allocations: 61.881 GiB, 0.11% gc time)
julia> # memory usage: 7.97 GB
julia> GC.gc()
julia> # memory usage: 3.29 GB

The first time I execute DOS, I set reuse_symbolic=false. For the second time, I set reuse_symbolic=true, it indeed speed up a little bit, and the number of allocations decreases a lot. However, when I clean the garbage collector after the second execution, there are some extra garbage (approximately 2.4 GB) that can not be cleaned and this value (the extra memory usage) dramatically increases when I increase the dimension of the matrix.

Did I missed anything when using the cache interface, or is there a better method to clean the garbage collection in the function so that it wouldn't produce so many un-releasable memory ?

ytdHuang commented 1 year ago

Additionally, if I increase the length of ωlist, the amount of extra garbage (un-releasable memory) will also increase.

ytdHuang commented 1 year ago

@Wimmerer Any idea on this issue ?

rayegun commented 1 year ago

I'll be able to look at this a bit later in the week I'm sorry. Wednesday.

ytdHuang commented 1 year ago

@Wimmerer I just did a few more tests on different versions of Julia. The previous test was done at Julia 1.8.0. I did the same test on Julia 1.8.4: the computation speeds up a little bit, but the memory issue still exists. I also did the same test on Julia 1.9.0-beta2:

The memory issue was finally solved, i.e., the un-releasable memory can be mostly released by GC.gc() in Julia 1.9.

However, the computation time become much longer because the umfpack is not using multithreading in Julia 1.9 by default.