JuliaInv / MUMPSjInv.jl

MUMPS interface for Julia
18 stars 15 forks source link

MUMPS not using more than one core #17

Open ianwilliamson opened 6 years ago

ianwilliamson commented 6 years ago

During my testing I haven't been able to get MUMPS to use more than a single core. Following the instructions in the MUMPS documentation I have set the env variable OMP_NUM_THREADS=12 (prior to starting Julia). On this same system, this approach of exporting OMP_NUM_THREADS works for telling Pardiso to use more cores. What else could I look into trying to get MUMPS to use parallelism in factorization and solving?

Pbellive commented 6 years ago

What platform are you on? This might be OS or BLAS library dependent. MUMPS runs multithreaded for me with OpenBLAS on Linux when I set OMP_NUM_THREADS before starting Julia. To be precise, I tested this just now on:

julia> versioninfo()
Julia Version 0.6.4-pre.0
Commit 423d4dfcec* (2018-05-30 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=16)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

If you're on OS X and using Apple's BLAS you need to set VECLIB_MAXIMUM_THREADS, rather than OMP_NUM_THREADS.

ianwilliamson commented 6 years ago

Also on Linux:

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Opteron(tm) Processor 6386 SE
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Piledriver)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, bdver1)

and I can successfully get Pardiso to use multiple cores on this system via the export OMP_NUM_THREADS approach.

Oddly (I'm not sure if this is related) but I have the following testing script:

using MUMPS
N = Int64(1e4)
d = 0.00001
A = sprandn(N, N, d) + 1im*sprandn(N, N, d)
b = rand(N) + 1im*rand(N)
A′ = @time factorMUMPS(A, 0)
x  = @time applyMUMPS(A′, b)

which I was able to run once, but after aborting mid-factorization, and then re-running now only outputs the error:

ERROR: LoadError: MUMPS: error --> -3 <--. Please refer to Ch. 7 of MUMPS User's guide!
Stacktrace:
 [1] checkMUMPSerror(::Array{Int64,1}) at /home/---/.julia/v0.6/MUMPS/src/MUMPSfuncs.jl:69
 [2] factorMUMPS(::SparseMatrixCSC{Complex{Float64},Int64}, ::Int64, ::Int64) at /home/---/.julia/v0.6/MUMPS/src/MUMPSfuncs.jl:33
 [3] factorMUMPS(::SparseMatrixCSC{Complex{Float64},Int64}, ::Int64) at /home/---/.julia/v0.6/MUMPS/src/MUMPSfuncs.jl:24
 [4] include_from_node1(::String) at ./loading.jl:576
 [5] include(::String) at ./sysimg.jl:14
 [6] process_options(::Base.JLOptions) at ./client.jl:305
 [7] _start() at ./client.jl:371
while loading /home/---/MUMPS_test.jl, in expression starting on line 237

At the same time, running Pkg.test("MUMPS") completes successfully... now I don't know what's going on...

Pbellive commented 6 years ago

Strange. I'm sorry, I have no good idea why openBLAS isn't picking up on your OMP_NUM_THREADS setting. In any event, neither the wrapper code nor the MUMPS library itself do anything to control multithreading. All that is going on is that MUMPS does much of its work through BLAS calls and these can be done multithreaded. So you set the number of threads for BLAS to use and MUMPS will just use what's there. I would double check that you built this package against the correct BLAS library and that you can run that library multithreaded in general. Unfortunately that's all I can think of, sorry.

As for your other error, that seems to be a case of MUMPS throwing the wrong error message. My guess is that your random matrix was rank deficient. In this case MUMPS should throw a -6 or -10 error code but for some reason it's throwing -3. I checked that trying to factor a rank deficient complex matrix can give the -3 error code. e.g.

julia> D = sprandn(20,20,0.1);

julia> Dinv = @time factorMUMPS(D,0);
ERROR: MUMPS: error --> -3 <--. Please refer to Ch. 7 of MUMPS User's guide!
Stacktrace:
 [1] checkMUMPSerror(::Array{Int64,1}) at /home/patrick/.julia/v0.6/MUMPS/src/MUMPSfuncs.jl:69
 [2] factorMUMPS(::SparseMatrixCSC{Float64,Int64}, ::Int64, ::Int64) at /home/patrick/.julia/v0.6/MUMPS/src/MUMPSfuncs.jl:50
 [3] factorMUMPS(::SparseMatrixCSC{Float64,Int64}, ::Int64) at /home/patrick/.julia/v0.6/MUMPS/src/MUMPSfuncs.jl:41

julia> rank(full(D))
16

julia> D = D + speye(20);

julia> Dinv = @time factorMUMPS(D,0);
  0.005187 seconds (149 allocations: 8.881 KiB)

MUMPS.jl just returns the error code thrown by MUMPS itself so this is an upstream bug.

ianwilliamson commented 6 years ago

Thanks for the pointer on that misleading error message. I guess I'll take a look at the build.

ianwilliamson commented 6 years ago

The only thing I had to modify out of the box was in src/compiler_options.in:

LIBBLAS = -lblas #/usr/lib/libblas/libblas.a

to get it to compile on Ubuntu. This seems to be the standard way to hook into blas and I'm not sure what other options I would need to give it.