JuliaSparse / SparseArrays.jl

SparseArrays.jl is a Julia stdlib
https://sparsearrays.juliasparse.org/
Other
92 stars 52 forks source link

CHOLMOD default ordering options: METIS vs AMD #548

Closed gerw closed 3 months ago

gerw commented 3 months ago

Consider the following snippet

using SparseArrays
using LinearAlgebra

function setup(n)
    e = ones(n)
    e2 = -ones(n-1)
    K = spdiagm(n, n, 0 => 2.0e, 1 => e2, -1 => e2)
    Id = sparse(1.0*I, n, n)
    return kron(K, Id) + kron(Id, K)
end

function solve(n)
    A = setup(n)
    F = ones(n^2)
    @time A \ F
end

It just builds a discretization of a PDE and solves it. Now let us execute

solve(1156);
solve(1157);

Using julia 1.9.3 I get

  2.095628 seconds (66 allocations: 1.339 GiB, 0.08% gc time)
  2.134111 seconds (66 allocations: 1.339 GiB, 0.08% gc time)

and all is fine. With julia 1.10.4 instead I get

  1.966967 seconds (69 allocations: 1.339 GiB, 0.17% gc time)
  7.775528 seconds (1.99 M allocations: 9.856 GiB, 2.97% gc time)

Thus, the second solve on a slightly larger system is suddenly much slower and allocates memory. The profiling shows that cholmod_l_analyze runs much longer. The same happens with the RC of 1.11.

Digging a little bit deeper with

A = setup(1157);
B = SparseArrays.CHOLMOD.Sparse(A);
@time F = SparseArrays.CHOLMOD.symbolic(B, perm=nothing);
@time cholesky!(F, A; shift = 1., check = true);

and looking at the output of top, one can see that symbolic runs only on a single core and allocates memory, cholesky! seems to be fine (and seems to run in parallel).

I also tried to fiddle around with

SparseArrays.CHOLMOD.getcommon()[].nthreads_max

(which is 1) after startup, but it does not seem to change anything.

ViralBShah commented 3 months ago

It seems the first order of business should be to see what is driving those allocations and fix that.

gerw commented 3 months ago

Can you reproduce this behaviour?

If I see it correctly, symbolic just boils down to a ccall to cholmod_l_analyze...

ViralBShah commented 3 months ago

Yes, I see the same thing with 1.10.4. The allocations are all in

julia> S = CHOLMOD.Sparse(A);

julia> @time CHOLMOD.cholmod_l_analyze(S, g)
  6.854387 seconds (1.99 M allocations: 9.025 GiB, 0.04% gc time)
Ptr{SparseArrays.LibSuiteSparse.cholmod_factor_struct} @0x0000600003638010
ViralBShah commented 3 months ago

@KristofferC Any ideas?

KristofferC commented 3 months ago

Not immediately, but I will do a bisection and start from there.

KristofferC commented 3 months ago

Bisected to https://github.com/JuliaLang/julia/pull/48977 on the julia side.

KristofferC commented 3 months ago

Reverting to SuiteSparse_jll 5.10.1 seems to fix it (basically reverting the libsuitesparse.version change in https://github.com/JuliaLang/julia/pull/48977):

  2.333998 seconds (4.16 k allocations: 1.618 GiB, 2.01% gc time, 0.66% compilation time)
  1.968816 seconds (140 allocations: 1.698 GiB, 1.90% gc time)

cc @rayegun

ViralBShah commented 3 months ago

But the offending line is just a 1-liner that is showing 2M Julia allocations - that suggests the issue is not in SuiteSparse.

KristofferC commented 3 months ago

that suggests the issue is not in SuiteSparse.

This is not accurate because we register the CHOLMOD allocation functions with our GC:

https://github.com/JuliaSparse/SparseArrays.jl/blob/e61663ad0a79a48906b0b12d53506e731a614ab8/src/solvers/cholmod.jl#L242-L246

so even allocations within CHOLMOD are counted in @time. Doing a profile

julia> @profile CHOLMOD.cholmod_l_analyze(S, g)

julia> Profile.print(noisefloor=2, C=true)

it seems to be in CHOLMOD a lot:

3646 …c/solvers/lib/aarch64-apple-darwin20.jl:1286; cholmod_l_analyze(arg1::SparseArrays.CHOLMOD.Sparse{Float64, Int64}…
 3646 …arwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_analyze_p2
  211  …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_amd
191   ╎    194  …e.darwin14/lib/julia/libamd.3.2.1.dylib:?; amd_l2
      ╎   3313 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_metis
  1   ╎    3292 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_METIS_NodeND
      ╎     3268 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎ 2787 …win14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎  2339 …win14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎   1896 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    1475 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎     1057 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎ 682  …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎  386  …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎   194  …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
  1   ╎    ╎    ╎    71   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎     23   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎    ╎ 15   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎    ╎  9    …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
      ╎    ╎    ╎    ╎   8    …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
  6   ╎    ╎    ╎    ╎    6    …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎     25   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎    ╎ 14   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
  4   ╎    ╎    ╎    ╎  10   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
      ╎    ╎    ╎    ╎ 11   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
  1   ╎    ╎    ╎    ╎  10   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
  6   ╎    ╎    ╎    ╎   7    …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎    71   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎     26   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
  8   ╎    ╎    ╎    ╎ 17   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
      ╎    ╎    ╎     44   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
  1   ╎    ╎    ╎    ╎ 33   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
 29   ╎    ╎    ╎    ╎  30   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎    39   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Refine2WayNode
 32   ╎    ╎    ╎     32   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayNodeRefine1Sided
      ╎    ╎    ╎   119  …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
  2   ╎    ╎    ╎    59   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
  6   ╎    ╎    ╎     18   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_RM
  9   ╎    ╎    ╎    ╎ 9    …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
 25   ╎    ╎    ╎     38   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
 12   ╎    ╎    ╎    ╎ 12   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
      ╎    ╎    ╎    56   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
  3   ╎    ╎    ╎     40   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
 29   ╎    ╎    ╎    ╎ 33   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎   59   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Refine2WayNode
 49   ╎    ╎    ╎    51   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayNodeRefine1Sided
      ╎    ╎    ╎  184  …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎   78   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
 10   ╎    ╎    ╎    26   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_RM
 11   ╎    ╎    ╎     11   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
 22   ╎    ╎    ╎    52   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
KristofferC commented 3 months ago

I don't know if the lack of multithreading is coupled to the increase in allocations but both performance and allocations seems to be worse when going from v5 -> v7.

ViralBShah commented 3 months ago

Ok - that's good to know. So should we report the issue upstream then?

KristofferC commented 3 months ago

On v5, I only see the following in the profile:

julia> @time CHOLMOD.cholmod_l_analyze(S, g)
  0.549651 seconds (26 allocations: 373.199 MiB, 9.98% gc time)

julia> @profile CHOLMOD.cholmod_l_analyze(S, g);

julia> Profile.print(noisefloor=2, C=true)
...
   ╎  321 ...src/solvers/lib/aarch64-apple-darwin20.jl:1084; cholmod_l_analyze(arg1::SparseArrays.CHOLMOD.Sparse{Float64}, a...
  3╎   321 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_analyze_p2
   ╎    215 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_amd
192╎   ╎ 195 ...ple.darwin14/lib/julia/libamd.2.4.6.dylib:?; amd_l2
   ╎    50  ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_analyze_ordering
 18╎   ╎ 18  ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_rowcolcounts
   ╎   ╎ 20  ...apple.darwin14/lib/julia/libcholmod.dylib:?; permute_matrices
   ╎   ╎  20  ...pple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_ptranspose
 18╎   ╎   18  ...pple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_transpose_sym
ViralBShah commented 3 months ago

SuiteSparse 7 comes with METIS built in for partitioning, which is where it seems most of the time is being spent. Is this kind of thing expected from METIS? Perhaps @fredrikekre has some experience with it.

KristofferC commented 3 months ago

So should we report the issue upstream then?

Maybe, assuming it isn't our fault due to different build options etc. The following is a decent Julia reproducer at least:

using SparseArrays, LinearAlgebra
using SparseArrays: CHOLMOD, LibSuiteSparse

function setup(n)
    e = ones(n)
    e2 = -ones(n-1)
    K = spdiagm(n, n, 0 => 2.0e, 1 => e2, -1 => e2)
    Id = sparse(1.0*I, n, n)
    return kron(K, Id) + kron(Id, K)
end

n = 1157;
A = setup(n);
F = ones(n^2);

S = CHOLMOD.Sparse(A);
g = CHOLMOD.getcommon()

using .LibSuiteSparse: libcholmod, cholmod_l_analyze, cholmod_sparse, cholmod_common, cholmod_factor
@time @ccall libcholmod.cholmod_l_analyze(S::Ptr{cholmod_sparse},
                                    g::Ptr{cholmod_common})::Ptr{cholmod_factor}

1.9:

  0.560903 seconds (125 allocations: 454.909 MiB, 6.46% gc time, 1.14% compilation time)

1.10:

  5.704131 seconds (1.99 M allocations: 9.105 GiB, 1.86% gc time, 0.13% compilation time)
ViralBShah commented 3 months ago

@DrTimothyAldenDavis Is this kind of slowdown expected when METIS kicks in?

DrTimothyAldenDavis commented 3 months ago

METIS produces good orderings but it takes a very long time, particularly for single analyze/factorize/solves.

The default ordering is to try AMD. Then if AMD produces a good ordering I use that. Otherwise, I try METIS. You can disable METIS.

See page 104 of the user guide for CHOLMOD 5.2.1. Or the cholmod.h include file here:

https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/26ababc7f3af725c5fb9168a1b94850eab74b666/CHOLMOD/Include/cholmod.h#L555-L574

ViralBShah commented 3 months ago

@DrTimothyAldenDavis I assume METIS can also be disabled in SuiteSparse 7.

DrTimothyAldenDavis commented 3 months ago

Yes, CHOLMOD 5.2.1 is the most recent stable release of CHOLMOD, in SuiteSparse 7.7.0.

ViralBShah commented 3 months ago

Ah thank you. I misread CHOLMOD 5 to mean SuiteSparse 5.

gerw commented 3 months ago

Thank you for all the discussion. Running

@SparseArrays.CHOLMOD.cholmod_param nmethods = 2 solve(1157);

gives back the behaviour from julia 1.9.3.

Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?

DrTimothyAldenDavis commented 3 months ago

Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?

That might be a good idea. It would be the least disruptive to the performance seen in julia.

DrTimothyAldenDavis commented 3 months ago

p.s. The title of this issue could be revised, to something like "CHOLMOD default ordering options: METIS vs AMD".

gerw commented 3 months ago

Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?

That might be a good idea. It would be the least disruptive to the performance seen in julia.

And somewhere in the documentation, one should point to the possibility of increasing nmethods when the factorization is reused for multiple solves.

This being said, I cannot find any reference to the cholesky decomposition from SparseArrays in the documentation?

ViralBShah commented 3 months ago

The documentation for the solvers has been, uh, sparse. But I don't know why it doesn't show up at all. We should at least have the solvers show up in Julia docs like they do in SparseArrays docs.

I suspect the index.md from SparseArrays just gets picked up and none of the other files do when Julia stdlib docs are built. So perhaps we should just integrate solvers.md in to index.md and have one big file for all SparseArrays.jl documentation.

Based on the discussion and findings here, METIS should certainly opt-in and documented. As for threads, I suppose we should use threading by default. Would you be up for making a PR?

gerw commented 3 months ago

Based on the discussion and findings here, METIS should certainly opt-in and documented. As for threads, I suppose we should use threading by default. Would you be up for making a PR?

Uh, I don't know, rather not. I do not have set up the infrastructure to build julia (or at least SparseArrays)...

ViralBShah commented 3 months ago

I'll get around to it in the next few days, unless someone beats me to it.

ViralBShah commented 3 months ago

@DrTimothyAldenDavis Is it possible to change the number of threads for symbolic() and other phases after CHOLMOD is aleady loaded - or between solves and such?

DrTimothyAldenDavis commented 3 months ago

Yes, just change the Common->nthreads_max, for the "Common" object passed in as the last parameter to all CHOLMOD methods.