Open dpo opened 10 years ago
Parameter number 4 to ZGEMM
should be N
, so this is probably the same issue as https://github.com/JuliaOpt/Ipopt.jl/issues/1#issuecomment-37556837 - Mumps typically uses an LP64 blas, but Julia uses ILP64 OpenBlas by default. It looks like Mumps is being set up in this package to link to a shared-library blas, so zgemm_
is shadowed.
Thanks @dpo for pointing into this very interesting direction. I agree that installing MUMPS using homebrew would be a great improvement! Not only more elegant but also giving access to MUMPS' parallel version.
The warning you mentioned seems to be independent from the MUMPS installation. At least I got the same warning, when running the tests on my machine using the current version of Julia. Unfortunately, I do not see either where it comes from.
Did the comment @tkelman made help you? I cannot try it on my machine yet as I have trouble installing MUMPS using homebrew.
I haven't had a chance to play with this. You might want to post the error message in the Julia user's forum to find out what it means. Please open an issue on homebrew/science if you have trouble installing Mumps.
Hi Dominique,
sorry that it took me that long, but I now managed to install mumps using brew (seems like missing Xcode command line tools have caused my problems). I tried to use your makefile, but it failed with many errors. Here is the start:
gfortran -I/usr/local/opt/mumps/include -fPIC -fcray-pointer -c mumps_cmplx_p.f90
mumps_cmplx_p.f90:24.23:
TYPE (ZMUMPSbrew_STRUC),intent(inout):: mumps_par
1
Error: Derived type 'zmumpsbrew_struc' at (1) is being used before it is defined
mumps_cmplx_p.f90:28.13:
mumps_par%SYM = sym ! =0 unsymmetric, =1 symm. pos def, =2 general symm.
1
Error: Symbol 'mumps_par' at (1) has no IMPLICIT type
mumps_cmplx_p.f90:29.13:
mumps_par%PAR = 1
1
Error: Symbol 'mumps_par' at (1) has no IMPLICIT type
mumps_cmplx_p.f90:31.13:
mumps_par%JOB = -1 ! initialize
I'm leaving out some more errors of that type... here is how it ends:
Error: Symbol 'mumps_par' at (1) has no IMPLICIT type
mumps_cmplx_p.f90:144.13:
mumps_par%JOB = 2 ! factorization
1
Error: Symbol 'mumps_par' at (1) has no IMPLICIT type
Fatal Error: Error count reached limit of 25.
make: *** [mumps_cmplx_p.o] Error 1
@dpo : Do you have a clue what's going on? Is there anything I need to modify in the Fortran code?
Cheers, Lars
Hi Lars. Sorry for the slow reply. I just pulled the git repo and it builds just fine against my homebrew Mumps with this Makefile (which is similar to the one I posted above). Does it not build for you?
The tests don't run for me though because it seems MPI isn't initialized. I quickly hacked mumps_p.f90
(here) but there's an error related to the LAPACK function DSYR
:
** On entry to DSYR parameter number 7 had an illegal value
You can probably tell more quickly than I what the problem is.
Hey Dominique.
I did some extensive testing and asked around, but could not come to a solution. First of all, your makefile now works for me and I could test mumps with and without MPI as well as with openblas and with my Mac's blas.
Unfortunately, I could not reproduce your error message. Instead I am getting Segmentation faults!
This one when I link against brew's MUMPS without MPI support (for both BLAS versions):
include("testMUMPS.jl")
test MUMPS
Test for real SPD matrix: one rhs
signal (11): Segmentation fault: 11
??? at ???:110110743
Segmentation fault: 11
And this one when I link against brew's MUMPS with MPI (for both BLAS versions):
include("testDivGrad.jl")
Test for real SPD matrix: one rhs
signal (11): Segmentation fault: 11
??? at ???:110110743
[dhcp-128-189-89-68:81342] *** Process received signal ***
[dhcp-128-189-89-68:81342] Signal: Segmentation fault: 11 (11)
[dhcp-128-189-89-68:81342] Signal code: (0)
[dhcp-128-189-89-68:81342] Failing at address: 0x0
[dhcp-128-189-89-68:81342] [ 0] 0 libsystem_c.dylib 0x00007fff8723b90a _sigtramp + 26
[dhcp-128-189-89-68:81342] [ 1] 0 libdmumps.dylib 0x000000011ad5a9b7 __dmumps_load_MOD_dmumps_467 + 82
[dhcp-128-189-89-68:81342] [ 2] 0 libopenblas.dylib 0x000000010580e9ce dsyr_ + 302
[dhcp-128-189-89-68:81342] [ 3] 0 libdmumps.dylib 0x000000011ad13cb0 dmumps_230_ + 142
[dhcp-128-189-89-68:81342] *** End of error message ***
signal (11): Segmentation fault: 11
??? at ???:110110743
Segmentation fault: 11
It also points to the DSYR function. People here have the feeling it might be some issue with linking BLAS dynamically (Julia may link to a different BLAS as well). But this is just a guess.
At this point we have no idea how to fix it. Sorry!
Hi Lars. I forked your repository yesterday and worked on it for a while. I found a way to refer to the library in a generic way, provided it's in the user's LD_LIBRARY_PATH
. But I'm getting the same segmentation fault as you. Mumps and Julia use different BLAS libraries, but I'm not sure why that would be a problem. I'll contact some Julia folks to see if they have any idea.
Is Homebrew OpenBLAS built with 32 bit or 64 bit integers?
Mine is build with 64bit integers. But how does that matter? Mumps uses its own BLAS.
Right, and what does it expect as the integer size there? Unless it's linking to a static-library BLAS, the integer sizes will all have to be the same. Shared libraries will shadow each other. We have this problem all the time with Ipopt.
Aha. The BLAS used by MUMPS is the default brew OpenBLAS, so I presume it uses 32bit integers. In my fork, I carefully changed all integer data to 32bit because MUMPS uses default FORTRAN INTEGERs, which are 32bit. Are you saying MUMPS should link the Julia's OpenBLAS?
Either Mumps will need to link a static-library Blas that uses 32-bit integers, or you'll need to configure Mumps to use 64-bit integers in all Blas calls (this might be doable with -fdefault-integer=8
or some such, but I've never tried it)
Or, you can tell Julia and Blas and Mumps to use 32-bit integers everywhere (USE_BLAS64=0
I think it is), but the default binaries are not built this way AFAIK.
I see. In the second scenario, what if a user built Julia with a 32bit OpenBLAS?
Sorry yeah edited in a 3rd option - you'll need Julia and Mumps to be consistent in the integer size they expect from Blas, unless you link statically.
Ok, will give that a try. Unless there's a plan to include MUMPS in Julia at some point, I think we'll have to go with the static linking. I'm still not sure I totally understand the issue because MUMPS's BLAS lib is libopenblas
while Julia's lib is libopenblas64
. How can there be confusion?
Oh do you have openblas building under a different library name when it uses 64 bit integers? That's probably a good idea. I think the confusion and cross-talk is from having multiple libraries dynamically loaded that have the same function names, though I'm not an expert in how dynamic linking works I just have seen the consequences and ways that I've been able to fix it.
We did static linking (to netlib blas, but a better static blas should also work) for the purposes of Mumps within the Ipopt.jl package, see my original link from April.
I think the eventual best thing to do here is prefix all the function names for Julia's 64-bit-integer openblas to avoid this problem. That's the plan in the cross-referenced Julia issue 4923, it just hasn't been done yet.
That's tricky. Thanks a lot for the tip! That sounds like the likely culprit.
Well I just rebuilt MUMPS against a static 32bit OpenBLAS but I'm still getting the same segfault. Now MUMPS depends on Scalapack, which also links to its own BLAS (dylib for now). It's not realistic to require that everybody be linked statically.
Perhaps not. Until someone figures out the proper way to use objcopy
as part of Julia's build system and writes the requisite macros around all blas ccalls to only add the prefixes when necessary, the package will only be usable when Julia is configured for 32-bit-integer blas.
Confirm: The MUMPS interface works with Julia32. Thanks @tkelman!
There's still a segfault when I try to solve two systems in a row. Somehow memory isn't cleared up properly and/or MUMPS isn't reinitialized properly.
@dpo if it works the first time but not the second, I suspect that's a bug in memory handling as you said, and probably has more to do with the package binding code here in MUMPS.jl than anything about the Julia build configuration.
Well I was not able to make this work, so I rewrote my own MUMPS interface (in C this time!) See https://github.com/dpo/MUMPS.jl. Deallocating / reallocating works fine (at least for me). It's real-only at the moment, but adding complex arithmetic is simple. The C interface also seems far simpler and easier to maintain.
thanks! this looks really interesting. looking forward to testing it soon!
ps: the new interface also runs in parallel!
hi dominique. I like your interface a lot better than our fortran interface (especially the access to all control parameters). I also agree it seems simpler to maintain. However, did I understand it correctly, that is still is limited to Julia's 32 bit version? For our applications that would not be feasible. Are there new ideas how to combine Julia 64 bit and MUMPS?
Hi Lars. The 32 bit thing is only because my BLAS is 32 bit. If you can manage to build Julia against a 64 bit BLAS and not have a 32 bit BLAS shadow the functions of the first, you should be good to go. However, MUMPS uses default Fortran INTEGERs, which are 32 bit unless you compile with -fdefault-integer-8
. In the latter case, the MUMPS/Julia interface will require minor tweaks. I'll put that on the list.
If you can manage to build Julia against a 64 bit BLAS and not have a 32 bit BLAS shadow the functions of the first
@lruthotto, @dpo, this is being tracked at https://github.com/JuliaLang/julia/issues/4923. If you could speak up there that this is causing problems in packages like this one, and more importantly if you have any suggestions that could work on Linux and Mac (I already have a solution that should work for Windows), please let us know.
I tried building
MUMPS.jl
against the shared MUMPS library built via Homebrew. An advantage is that we can callbrew
to discover the location of the library, so there's no need to package the MUMPS source code. Here's a sampleMakefile
:This
Makefile
assumes that the parallel MUMPS library was built. If the sequential library was built (withbrew install mumps --without-mpi
),incdir
becomes$(prefix)/libexec/include
and we must add-lmpiseq
toLIB
.This compiles but it seems something needs to be updated in the interface:
(sorry I'm too new to Julia to know what that message means). I'm also getting a bunch of messages of the form: