Sparse Matrix Solve Index Type

JaredCrean2 commented 9 years ago

It looks like there has been a regression with sparse matrix solves. A SparseMatrixCSC{Float64, Int64} works, but SparseMatrixCSC{Float64, Int32} doen't anymore.

Here is little example I cooked up:

m = n = 3
a = rand(m,n)
b = sparse(a)
rhs = rand(m)

colptr_small = zeros(Int32, length(b.colptr))
rowval_small = zeros(Int32, length(b.rowval))

# convert to Int32 indices
# is there a better syntax/function for doing this?
for i=1:length(b.colptr)
  colptr_small[i] = b.colptr[i]
end

for i=1:length(b.rowval)
  rowval_small[i] = b.rowval[i]
end

c = SparseMatrixCSC(m, n, colptr_small, rowval_small, b.nzval)

x1 = a\rhs
println("x1 = ", x1)
x2 = b\rhs
println("x2 = ", x2)
x3 = c\rhs   # error
println("x3 = ", x3)

The error is:

ERROR: LoadError: MethodError: `convert` has no method matching convert(::Type{Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Union{Float64,Complex{Float64}}}}, ::Base.SparseMatrix.SparseMatrixCSC{Float64,Int32})
This may have arisen from a call to the constructor Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Union{Float64,Complex{Float64}}}(...),
since type constructors fall back to convert methods.
Closest candidates are:
  call{T}(::Type{T}, ::Any)
  convert{T}(::Type{T}, !Matched::T)
  Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Union{Float64,Complex{Float64}}}(!Matched::Ptr{Base.SparseMatrix.CHOLMOD.C_Sparse{Tv<:Union{Float64,Complex{Float64}}}})
  ...
 in factorize at sparse/linalg.jl:642
 in \ at linalg/generic.jl:314
 in include at ./boot.jl:254
 in include_from_node1 at ./loading.jl:187
 in process_options at ./client.jl:308
 in _start at ./client.jl:411
while loading /home/jared/repos/julia_tests/colptr/main.jl, in expression starting on line 24

It worked with:

Julia Version 0.4.0-dev+5149
Commit 317a4d1* (2015-06-01 18:58 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7 CPU       Q 820  @ 1.73GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

But does not with:

Julia Version 0.4.0-dev+6095
Commit c75f4d4* (2015-07-20 14:38 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7 CPU       Q 820  @ 1.73GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

KristofferC commented 9 years ago

See #11710 and #11877.

Specifically: "I don't think it is allowed to mix 32 and 64 bit integers in the same CHOLMOD session and since we only can have one CHOLMOD session we'll have to decide on a size on startup. Right now I've chosen SuiteSparse_long."

JaredCrean2 commented 9 years ago

What happens if you mix integers during a session? CHOLMOD provides different functions for the different sizes of integers, so I would be surprised if mixing them causes a problem.

KristofferC commented 9 years ago

To switch the integer type you need to call cholmod_finish and reinitiate the pointers with cholmod_start

And:

Do not call this routine [cholmod_start] after any other CHOLMOD routine (except
 * cholmod_finish, to start a new CHOLMOD session), or a memory leak will
 * occur.

So from my understanding you can only switch integer type as long as you clear all memory and the choice was made (currently) to only support int64.

CC @andreasnoack because he is the expert.

tkelman commented 9 years ago

There's a lot of nasty global state in cholmod that was causing intermittent problems and hard-to-fix memory leaks. We should probably fix the conversion error though, even if we only support one set of integer sizes directly in cholmod.

andreasnoack commented 9 years ago

@KristofferC Thanks for explaining this. I don't have much to add. @JaredCrean2 I think it would cause memory leaks like those discussed in the issues @KristofferC linked to. The example in CHOLMOD also has separate files for the two integer types. Do you have code examples where these are mixed and where you are sure there is no memory leaks?

@tkelman I sounds reasonable to have a method that promotes the element type, but would some users prefer an error instead a surprisingly large memory consumption?

JaredCrean2 commented 9 years ago

No, I have never used mixed integers with CHOLMOD, my surprise was based purely on the interface design. I guess my remaining questions are 1) is supporting only one integer size permanent or temporary, and 2) should the default size be user configurable? I have a bunch of code that uses 32 bit integers to save space and want to continue doing so somehow.

ViralBShah commented 9 years ago

Is it ok to promote to 64-bit Ints for purpose of the solve? I am wondering how much space is saved, if you are already using Float64 as the nonzero type. Also, since the factorizations will fill-in and will internally have compressed representations, I am curious to know how much difference in memory there will be.

andreasnoack commented 9 years ago

I don't think it is feasible to support more than one integer type for our CHOLMOD wrappers. One day we might have our own sparse solvers implemented in Julia and then there shouldn't be any restrictions on the integer types at all.

ViralBShah commented 9 years ago

Yes, I agree. Right now, this is not realistic to support.

JaredCrean2 commented 9 years ago

For storing the matrix, at least a 25% because rowvals is the same length as nzvals. For a small CFD problem I am running, the matrix has ~48,000 rows (and is square). Using 32 bit integers, it takes 1.1 Gb of memory to store, 1.5 Gb with 64 bit integers. When factoring, the 32 bit integer matrix expands to 5.0 Gb, and the 64 bit version to 7.5 Gb.

The problem with promoting is that it would require copying. Having both the 32 and 64 bit versions of rowvals and colptr in memory at the same time would be worse than using 64 bits from the beginning.

If both integer sizes cannot be supported at the same time, then letting the user choose which size to use would be the next best thing, so for cases where 32 bit integers are enough the memory cost can be reduced

andreasnoack commented 9 years ago

Right now it would require a rebuild of SuiteSparse. In theory, the integer size could be a global or command line flag specifying if the _l_ versions of the symbols in SuiteSparse should be used or not.

KristofferC commented 9 years ago

Is SuiteSparse just a bit unfortunately designed in the sense that it has this global state? Pardiso and DSS provides a simple handle for all the pointers that you can easily pass along in the factorization object.

andreasnoack commented 9 years ago

Might be, but I'm not familiar enough with Pardiso and DSS to really know.

tkelman commented 9 years ago

Pardiso should have a more sane and thread-safe API without the global state that cholmod has, when you have Pardiso or MKL available.

I don't think this should require re-compiling suitesparse since it does build both sizes of API right now IIRC, but it would likely require adjusting some of the Julia wrapper code in base and re-building the system image to switch between them safely right now.

KristofferC commented 9 years ago

Maybe we should typealias the sparsesuite_long that is used in the function calls so it is easy to swap. Just change in one place and recompile julia.

andreasnoack commented 9 years ago

@tkelman Yes, both versions are available where the long versions have the extra _l_ in the function names. Therefore, it would be fairly easy to switch by using a macro when looking up the symbol name. However, if the need for 32 bit integer support is more urgent, it might be easier to recompile SuiteSparse with SuiteSparse_long=int.

JaredCrean2 commented 9 years ago

I have an older v0.4 julia built locally, so its not an urgent problem. It just means the tests fail when Travis runs them.

What should we do with this issue? 32 bit integer support is important, but it also doesn't seem likely without some major changes. Leave it open or close it?

andreasnoack commented 9 years ago

Let's keep this one open until we have a way of either supporting 32 bit integers or moved SuiteSparse out of base. If the last thing happens the issue should just be moved to the SuiteSparse package.

JaredCrean2 commented 9 years ago

For PETSc we ended up building 3 versions of the library to support the different data types. Could we do something similar for SuiteSparse? That would allow us to have 2 SuiteSparse sessions for the different integer types?

andreasnoack commented 9 years ago

I'm not sure that it is worth the trouble and extra build time. On Mac and Linux, you can easily build a 32 bit version if needed, but there is a problem on Windows where building your own version is not really feasible. I think the right solution is to have something like a command line flag and then set the symbols in the wrappers accordingly. That is possible with a single build of SuiteSparse and it only requires that someone goes through the SuiteSparse wrappers and adjust them.

tkelman commented 9 years ago

Could we delay initialization of the global suitesparse state until it's actually first used, and add an API for manually initializing it (any maybe also shutting it down?) with a non-default index type?

andreasnoack commented 9 years ago

I'm not sure, but I don't think that is possible. At least, it will require some changes to how things are set up now because we initialize the library with the __init__ function and I think the modules are loading when Julia is launched.

JaredCrean2 commented 8 years ago

The example I posted above now causes a stack overflow error on the second to last line:

ERROR: LoadError: StackOverflowError:
 in copy at ./array.jl:100
 in float at sparse/sparsematrix.jl:234
 in call at essentials.jl:57 (repeats 255 times)
while loading /tmp/tmp3.jl, in expression starting on line 26

using Julia version

Julia Version 0.4.3-pre+6
Commit adffe19* (2015-12-11 00:38 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7 CPU       Q 820  @ 1.73GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

andreasnoack commented 8 years ago

This is fixed in 0.5, but I think the changes to sparse \ are too many to backport.

JaredCrean2 commented 8 years ago

It seems questionable to leave a bug like this in the release version, but I'm not familiar with the relevant code so I can't say for sure.

On to the next bug!

Doing sparse direct solves is allocation much more memory in the release version of 0.4 than with commit 317a4d1:

Using 317a4d1:

Newton iteration: 1
step_fac = 1.0
calculating sparse complex step jacobian
   7.405 seconds      (31343 k allocations: 1407 MB, 4.02% gc time)
complex step jacobian calculate @time printed above
   8.105 seconds      (335 k allocations: 290 MB, 0.20% gc time)
matrix solve @time printed above
step_norm = 1.9694307239102875e-8
transfering field to sub mesh

Copying Field solution_field to solution_field
writeVtuFile into buffers: 0.021317 seconds
writeVtuFile buffers to disk: 0.013556 seconds
vtk files solution_1 written in 0.048307 seconds
residual norm = 4.132172524285687e-10
relative residual 1.1048253319168017e-11
printed to convergence.dat

Newton iteration: 2
step_fac = 1.0
calculating sparse complex step jacobian
   7.834 seconds      (31300 k allocations: 1405 MB, 3.93% gc time)
complex step jacobian calculate @time printed above
   7.780 seconds      (35 allocations: 276 MB, 0.06% gc time)
matrix solve @time printed above
step_norm = 2.2561965362574253e-13
residual norm = 1.0724287139874633e-13
relative residual 2.867369169425296e-15
printed to convergence.dat
Newton iteration converged with residual norm 1.0724287139874633e-13
Newton iteration converged with relative residual norm 2.867369169425296e-15
  35.445 seconds      (67023 k allocations: 3930 MB, 2.29% gc time)
total solution time printed above
writing final solution

Using 0.4.3-pre+6

Newton iteration: 1
step_fac = 1.0
calculating sparse complex step jacobian
  5.734098 seconds (31.35 M allocations: 1.374 GB, 3.90% gc time)
complex step jacobian calculate @time printed above
  8.182827 seconds (321.44 k allocations: 2.081 GB, 0.91% gc time)
matrix solve @time printed above
step_norm = 2.0132899638987104e-8
transfering field to sub mesh

Copying Field solution_field to solution_field
writeVtuFile into buffers: 0.018180 seconds
writeVtuFile buffers to disk: 0.000431 seconds
vtk files solution_1 written in 0.020366 seconds
residual norm = 4.695845075144187e-10
relative residual 1.2555353299713121e-11
printed to convergence.dat

Newton iteration: 2
step_fac = 1.0
calculating sparse complex step jacobian
  5.627192 seconds (31.29 M allocations: 1.372 GB, 3.13% gc time)
complex step jacobian calculate @time printed above
  7.655763 seconds (11.72 k allocations: 2.069 GB, 1.13% gc time)
matrix solve @time printed above
step_norm = 2.5516216124329256e-13
residual norm = 1.0906469965037183e-13
relative residual 2.9160796719756993e-15
printed to convergence.dat
Newton iteration converged with residual norm 1.0906469965037183e-13
Newton iteration converged with relative residual norm 2.9160796719756993e-15
 31.965434 seconds (69.13 M allocations: 7.457 GB, 2.15% gc time)
total solution time printed above
writing final solution

Storing the matrix takes 275 megabytes of memory (and partially takes into account fill-in), so I'm not sure why the matrix solve is allocating roughly 2 gigabytes of memory.

I am using 64 bit indices for the matrix, so this might be unrelated to the 32 bit index issues and be better as a separate issue?

JaredCrean2 commented 8 years ago

Here is a small code that reproduces the problem

function test(n)
# create a diagonal SparseMatrixCSC of size n x n

vals = zeros(n)
rowval64 = zeros(Int64, n)
colval64 = zeros(Int64, n + 1)
for i=1:n
  vals[i] = i
  rowval64[i] = i
  colval64[i] = i
end

colval64[n+1] = n + 1

A = SparseMatrixCSC(n, n, colval64, rowval64, vals)

# create right hand side vectors
b = rand(n)
b2 = rand(n)

# warm up
c = A\b

# time
@time c3 = A\b2

return nothing

end

test(3000)

old julia : 1.265 milliseconds (56 allocations: 51952 bytes) new julia: 0.000848 seconds (65 allocations: 776.563 KB)

andreasnoack commented 8 years ago

This is not a bug, but a feature. See https://github.com/JuliaLang/julia/commit/ab7591c2a29977fcb2e092bacd35ef7d896c7206. Before that commit, we didn't track the memory allocation from SuiteSparse in Julia. Note that 51952 bytes are less than the size of the matrix A.

Small comment: it would helpful if you could post the complete link to commits. It makes it easier to investigate the issue.

JaredCrean2 commented 8 years ago

Oh, that is both cool (that Julia is tracking memory allocation better), and concerning (that the matrix has that much fill in).

I will link to commits in the future to make comparisons easier.

ViralBShah commented 6 years ago

The reported example works now.

JuliaLang / julia

Sparse Matrix Solve Index Type #12664