Closed gerw closed 3 months ago
It seems the first order of business should be to see what is driving those allocations and fix that.
Can you reproduce this behaviour?
If I see it correctly, symbolic
just boils down to a ccall to cholmod_l_analyze
...
Yes, I see the same thing with 1.10.4. The allocations are all in
julia> S = CHOLMOD.Sparse(A);
julia> @time CHOLMOD.cholmod_l_analyze(S, g)
6.854387 seconds (1.99 M allocations: 9.025 GiB, 0.04% gc time)
Ptr{SparseArrays.LibSuiteSparse.cholmod_factor_struct} @0x0000600003638010
@KristofferC Any ideas?
Not immediately, but I will do a bisection and start from there.
Bisected to https://github.com/JuliaLang/julia/pull/48977 on the julia side.
Reverting to SuiteSparse_jll 5.10.1 seems to fix it (basically reverting the libsuitesparse.version
change in https://github.com/JuliaLang/julia/pull/48977):
2.333998 seconds (4.16 k allocations: 1.618 GiB, 2.01% gc time, 0.66% compilation time)
1.968816 seconds (140 allocations: 1.698 GiB, 1.90% gc time)
cc @rayegun
But the offending line is just a 1-liner that is showing 2M Julia allocations - that suggests the issue is not in SuiteSparse.
that suggests the issue is not in SuiteSparse.
This is not accurate because we register the CHOLMOD allocation functions with our GC:
so even allocations within CHOLMOD are counted in @time
. Doing a profile
julia> @profile CHOLMOD.cholmod_l_analyze(S, g)
julia> Profile.print(noisefloor=2, C=true)
it seems to be in CHOLMOD a lot:
3646 …c/solvers/lib/aarch64-apple-darwin20.jl:1286; cholmod_l_analyze(arg1::SparseArrays.CHOLMOD.Sparse{Float64, Int64}…
3646 …arwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_analyze_p2
211 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_amd
191 ╎ 194 …e.darwin14/lib/julia/libamd.3.2.1.dylib:?; amd_l2
╎ 3313 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_metis
1 ╎ 3292 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_METIS_NodeND
╎ 3268 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ 2787 …win14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ 2339 …win14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ 1896 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ 1475 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ 1057 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ ╎ 682 …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ ╎ 386 …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ ╎ 194 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
1 ╎ ╎ ╎ 71 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ ╎ 23 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
╎ ╎ ╎ ╎ 15 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
╎ ╎ ╎ ╎ 9 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
╎ ╎ ╎ ╎ 8 …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
6 ╎ ╎ ╎ ╎ 6 …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
╎ ╎ ╎ 25 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
╎ ╎ ╎ ╎ 14 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
4 ╎ ╎ ╎ ╎ 10 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
╎ ╎ ╎ ╎ 11 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
1 ╎ ╎ ╎ ╎ 10 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
6 ╎ ╎ ╎ ╎ 7 …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
╎ ╎ ╎ 71 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
╎ ╎ ╎ 26 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
8 ╎ ╎ ╎ ╎ 17 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
╎ ╎ ╎ 44 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
1 ╎ ╎ ╎ ╎ 33 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
29 ╎ ╎ ╎ ╎ 30 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
╎ ╎ ╎ 39 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Refine2WayNode
32 ╎ ╎ ╎ 32 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayNodeRefine1Sided
╎ ╎ ╎ 119 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
2 ╎ ╎ ╎ 59 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
6 ╎ ╎ ╎ 18 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_RM
9 ╎ ╎ ╎ ╎ 9 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
25 ╎ ╎ ╎ 38 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
12 ╎ ╎ ╎ ╎ 12 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
╎ ╎ ╎ 56 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
3 ╎ ╎ ╎ 40 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
29 ╎ ╎ ╎ ╎ 33 …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
╎ ╎ ╎ 59 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Refine2WayNode
49 ╎ ╎ ╎ 51 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayNodeRefine1Sided
╎ ╎ ╎ 184 …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
╎ ╎ ╎ 78 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
10 ╎ ╎ ╎ 26 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_RM
11 ╎ ╎ ╎ 11 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
22 ╎ ╎ ╎ 52 …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
I don't know if the lack of multithreading is coupled to the increase in allocations but both performance and allocations seems to be worse when going from v5 -> v7.
Ok - that's good to know. So should we report the issue upstream then?
On v5, I only see the following in the profile:
julia> @time CHOLMOD.cholmod_l_analyze(S, g)
0.549651 seconds (26 allocations: 373.199 MiB, 9.98% gc time)
julia> @profile CHOLMOD.cholmod_l_analyze(S, g);
julia> Profile.print(noisefloor=2, C=true)
...
╎ 321 ...src/solvers/lib/aarch64-apple-darwin20.jl:1084; cholmod_l_analyze(arg1::SparseArrays.CHOLMOD.Sparse{Float64}, a...
3╎ 321 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_analyze_p2
╎ 215 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_amd
192╎ ╎ 195 ...ple.darwin14/lib/julia/libamd.2.4.6.dylib:?; amd_l2
╎ 50 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_analyze_ordering
18╎ ╎ 18 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_rowcolcounts
╎ ╎ 20 ...apple.darwin14/lib/julia/libcholmod.dylib:?; permute_matrices
╎ ╎ 20 ...pple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_ptranspose
18╎ ╎ 18 ...pple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_transpose_sym
SuiteSparse 7 comes with METIS built in for partitioning, which is where it seems most of the time is being spent. Is this kind of thing expected from METIS? Perhaps @fredrikekre has some experience with it.
So should we report the issue upstream then?
Maybe, assuming it isn't our fault due to different build options etc. The following is a decent Julia reproducer at least:
using SparseArrays, LinearAlgebra
using SparseArrays: CHOLMOD, LibSuiteSparse
function setup(n)
e = ones(n)
e2 = -ones(n-1)
K = spdiagm(n, n, 0 => 2.0e, 1 => e2, -1 => e2)
Id = sparse(1.0*I, n, n)
return kron(K, Id) + kron(Id, K)
end
n = 1157;
A = setup(n);
F = ones(n^2);
S = CHOLMOD.Sparse(A);
g = CHOLMOD.getcommon()
using .LibSuiteSparse: libcholmod, cholmod_l_analyze, cholmod_sparse, cholmod_common, cholmod_factor
@time @ccall libcholmod.cholmod_l_analyze(S::Ptr{cholmod_sparse},
g::Ptr{cholmod_common})::Ptr{cholmod_factor}
1.9:
0.560903 seconds (125 allocations: 454.909 MiB, 6.46% gc time, 1.14% compilation time)
1.10:
5.704131 seconds (1.99 M allocations: 9.105 GiB, 1.86% gc time, 0.13% compilation time)
@DrTimothyAldenDavis Is this kind of slowdown expected when METIS kicks in?
METIS produces good orderings but it takes a very long time, particularly for single analyze/factorize/solves.
The default ordering is to try AMD. Then if AMD produces a good ordering I use that. Otherwise, I try METIS. You can disable METIS.
See page 104 of the user guide for CHOLMOD 5.2.1. Or the cholmod.h include file here:
@DrTimothyAldenDavis I assume METIS can also be disabled in SuiteSparse 7.
Yes, CHOLMOD 5.2.1 is the most recent stable release of CHOLMOD, in SuiteSparse 7.7.0.
Ah thank you. I misread CHOLMOD 5 to mean SuiteSparse 5.
Thank you for all the discussion. Running
@SparseArrays.CHOLMOD.cholmod_param nmethods = 2 solve(1157);
gives back the behaviour from julia 1.9.3.
Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?
Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?
That might be a good idea. It would be the least disruptive to the performance seen in julia.
p.s. The title of this issue could be revised, to something like "CHOLMOD default ordering options: METIS vs AMD".
Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?
That might be a good idea. It would be the least disruptive to the performance seen in julia.
And somewhere in the documentation, one should point to the possibility of increasing nmethods
when the factorization is reused for multiple solves.
This being said, I cannot find any reference to the cholesky decomposition from SparseArrays in the documentation?
The documentation for the solvers has been, uh, sparse. But I don't know why it doesn't show up at all. We should at least have the solvers show up in Julia docs like they do in SparseArrays docs.
I suspect the index.md from SparseArrays just gets picked up and none of the other files do when Julia stdlib docs are built. So perhaps we should just integrate solvers.md in to index.md and have one big file for all SparseArrays.jl documentation.
Based on the discussion and findings here, METIS should certainly opt-in and documented. As for threads, I suppose we should use threading by default. Would you be up for making a PR?
Based on the discussion and findings here, METIS should certainly opt-in and documented. As for threads, I suppose we should use threading by default. Would you be up for making a PR?
Uh, I don't know, rather not. I do not have set up the infrastructure to build julia (or at least SparseArrays)...
I'll get around to it in the next few days, unless someone beats me to it.
@DrTimothyAldenDavis Is it possible to change the number of threads for symbolic()
and other phases after CHOLMOD is aleady loaded - or between solves and such?
Yes, just change the Common->nthreads_max, for the "Common" object passed in as the last parameter to all CHOLMOD methods.
Consider the following snippet
It just builds a discretization of a PDE and solves it. Now let us execute
Using
julia 1.9.3
I getand all is fine. With
julia 1.10.4
instead I getThus, the second solve on a slightly larger system is suddenly much slower and allocates memory. The profiling shows that
cholmod_l_analyze
runs much longer. The same happens with the RC of 1.11.Digging a little bit deeper with
and looking at the output of
top
, one can see thatsymbolic
runs only on a single core and allocates memory,cholesky!
seems to be fine (and seems to run in parallel).I also tried to fiddle around with
(which is
1
) after startup, but it does not seem to change anything.