Closed stevengj closed 9 years ago
Or is the problem worse than that? If I ccall
a library that in turn calls cblas_dgemm
, will it end up calling our OpenBLAS version even if it was originally linked to a completely different BLAS library (e.g. libblas.so
)?
In that case, we might have to hack OpenBLAS to rename its exported functions (e.g. cblas_dgemm64
etcetera) since we changed the ABI.
@xianyi, is there a way to tell OpenBLAS to add a prefix or suffix (e.g. 64
) to all its exported symbols, to make it possible to link both the 32-bit and 64-bit ABI in the same executable?
See also numpy/numpy#3916
Wouldn't it make more sense to put the 64
after the cblas part – as in cblas64_dgemm
?
The ideal solution would be to have a separate 64-bit ABI and build both 32 and 64 bit versions in the same library.
@ViralBShah that is actually the best solution here. That would be wonderful!
@StefanKarpinski, note that there is a Fortran dgemm
ABI too, and to avoid conflicts you need to rename both C and Fortran (unless we are not linking the Fortran ABI?). But I don't think it really matters what the name looks like, as long as there is a simple deterministic rule and it can be implemented as automatically as possible in the openblas source code. I was just thinking that a suffix might be easier to automate for both C and Fortran ABIs.
Currently we use the fortran abi only.
I wonder if we can somehow make matplotlib use its own blas. While we may be able to do all sorts of gymnastics with openblas, it will be difficult to do the same with vendor provided BLAS.
The other alternative would be to recompile our own numpy, but that makes installing PyCall much more of pain.
@ViralBShah, does MKL provide the 64-bit ABI?
The other alternative would be to recompile our own numpy, but that makes installing PyCall much more of pain.
Not to mention that the amount of stuff we compile ourselves is getting slightly ridiculous. But it's hard to avoid.
I believe MKL does have a 64-bit ABI - but not 100% sure. @andreasnoackjensen ?
I thought about recompiling numpy, but that is even more inconvenient.
I am not sure what exactly ABI mean, but MKL has 32 bit integers in the lp64 libraries and 64 bit integers in the ilp64 libraries. The symbols have the same names.
It's easy to add a prefix or suffix for 64-bit (ilp64) ABI. However, I am not sure OpenBLAS can support lp64 and ilp64 in one binary.
For MKL, you need link the application with different interface layer library, e.g. libmkl_intel_lp64.so or libmkl_intel_ilp64.so.
I think adding a prefix or suffix to the ilp64
OpenBLAS interface would already be a big help. @xianyi, assuming that such a suffix were added, what would go wrong if both the 32- and 64-bit OpenBLAS libraries were linked simultaneously?
@xianyi, is there any hope of progress on this?
Would naming the 64-bit version something like libopenblas_ilp64.so
solve this?
@ViralBShah, I'm not sure, but I doubt it. If you load two shared libraries which export the same symbol (e.g. dgemm_
) but with a different ABI, aren't there still going to be conflicts even if the libraries have different names? (At least if the libraries are loaded with RTLD_GLOBAL
?)
The easier thing then for now would be to just use the 32-bit version of openblas with IJulia, if that works.
Nassty 32-bit limits, we hates them forever!
Anyway, it's not just IJulia, since PyCall and Numpy can be used anywhere. And 32-bit vector size limits cause their own problems.
+1, we ran into a very similar issue here too: https://github.com/JuliaOpt/Ipopt.jl/issues/1#issuecomment-37556837
This was an instance of (here dcopy_
instead of cblas_dgemm
, but same idea)
If I
ccall
a library that in turn callscblas_dgemm
, will it end up calling our OpenBLAS version even if it was originally linked to a completely different BLAS library (e.g.libblas.so
)?
Any library linking to any LP64 shared library Blas/Lapack/etc can run into name shadowing and segfaults or other incorrect behavior when ccall
ed by Julia due to ILP64 openblas. Statically linking LP64 reference blas/lapack into the dependency library solves the issue in the case of Ipopt, but is not an ideal solution.
Since #5291 was merged there are now a handful of calls to cblas functions, otherwise I was going to suggest we could try co-opting OpenBlas' mechanism for handling trailing underscores as a potential way of attempting this.
We could always just patch the openblas source with a global s/cblas/jl_cblas/ substitution.
Isn't this mostly a visibility issue? Can we restrict openblas's symbols to not be visible to dlopen'ed shared libraries?
@mlubin, you're right that this would be the simplest option, if we can do it on all the relevant platforms. Is there a magic linker flag for this (analogous to RTLD_LOCAL
in dlopen
)?
Looks like if you want to avoid patching you need to use a linker script.
@pao, it looks like the link you found is for preventing some symbols from being exported at all. That's not what we want here. We want to export symbols to Julia, but not re-export them to other shared libraries.
Ah, sorry, I didn't catch that subtlety from @mlubin's comment; I see it now. I'm not deep enough on visibility to know whether that's even possible, though a cursory search didn't turn anything up.
This looks relevant. Some combination of -Bsymbolic
or -Bsymbolic-functions
, and/or creating wrappers ourselves with a prefix/suffix on the function names may work, if OpenBlas' build system can't easily be made to do what we want.
We could always just patch the openblas source with a global s/cblas/jl_cblas/ substitution.
If only. OpenBlas is full of preprocessor defines (and some perl? https://github.com/xianyi/OpenBLAS/blob/develop/exports/gensymbol looks promising) that obfuscate function naming (in particular NAME
and CNAME
), I'm having a hard time figuring out how it works.
Aha, looks like https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L776 is where NAME
and CNAME
are getting set.
I was just discussing this with @jiahao, and the easiest solution seems to be to use the GNU objcopy utility to just add a prefix jl_
to all exported symbols from libopenblas
after it is compiled.
That way, we don't need to hack the OpenBLAS source.
The only downside is that using Julia with MKL might be a pain, but there are probably ways around this with a @blas
macro to generate the ccall
s with or without the prefix.
:+1: that sounds easier - would renaming dgemm_
to jl_dgemm_
then cause a problem for any Lapack routines that try to call dgemm_
, or would objcopy fix the reference too?
there are probably ways around this with a
@blas
macro to generate theccalls
with or without the prefix
See also #2167 (will be needed if anyone ever wants to use MKL on Windows or Intel Fortran anywhere) and #4290. It's not very well-documented, but Matlab lets you switch Blas and Lapack via environment variables. Putting that runtime-switching (or startup, or sysimg-build-time) abstraction layer into Julia will be useful as long as it doesn't introduce a noticeable performance penalty.
I don't think runtime switching will be possible since MKL's libraries would not have the jl_
prefixes that the compiled Julia wrapper functions would be conditioned to expect.
@tkelman, objcopy
will rename both the exported symbols and all references to them within the object code, so BLAS calls within LAPACK should not be a problem since libopenblas
includes both LAPACK and BLAS. (I just double-checked this. It pretty much has to work this way, of course, for symbol renaming to be usable.)
Another likely instance of this: https://github.com/lruthotto/MUMPS.jl/issues/2
Having to rebuild the system image to change Julia's Blas backend wouldn't be too bad.
The number of library wrapper packages that depend on Blas and Lapack is already pretty high and will continue to grow. Most of these libraries should have decent facilities for configuring them with different Blas libraries at compile time. It'll be good to standardize an approach for providing a Blas library from Julia to library packages, for performance, reducing duplication, and cross-platform uniformity (no such thing as "system Blas" on Windows, and we want our library packages to work on Windows don't we?). The LP64 vs ILP64 issue is part of this, and it may require providing an LP64 Blas library with the default function names for packages, while Julia itself uses an ILP64 Blas with prefixed function names.
So is "using the GNU objcopy utility to just add a prefix jl_ to all exported symbols from libopenblas after it is compiled" a good solution? If so, what needs to be done to make it work?
@ufechner7, two things (a) the Makefile needs to be updated to make the requisite call to objcopy
and (b) base/linalg/blas.jl
etcetera need to be updated to change all ccall
s to BLAS and LAPACK routines with e.g. a @blascall(...)
macro that prepends the jl_
prefix to the symbol (we want a macro here so that it can be easily changed, e.g. to call MKL).
Did anyone start experimenting with this to see how feasible it is?
Not yet, as far as I know. I only tried out objcopy
to verify that it could rename the symbols.
I tried cp libopenblas.so libjlopenblas.so; objcopy --prefix-symbols=jl_ libjlopenblas.so
then
julia> n = 5; a = rand(n); b = rand(n); inca = 1; incb = 1;
julia> y = ccall((:jl_ddot_, "libjlopenblas"), Float64, (Ptr{Int}, Ptr{Float64}, Ptr{Int}, Ptr{Float64}, Ptr{Int}), &n, a, &inca, b, &incb)
ERROR: ccall: could not find function jl_ddot_ in library libjlopenblas
in anonymous at no file
So something's missing. nm libjlopenblas.so | grep ddot
does return the expected
00000000000f47b0 T jl_cblas_ddot
00000000000f3aa0 T jl_ddot_
0000000000f29200 T jl_ddot_k_ATOM
0000000000c1ce00 T jl_ddot_k_BARCELONA
0000000000dbf200 T jl_ddot_k_BOBCAT
0000000001299e00 T jl_ddot_k_BULLDOZER
00000000004b4e00 T jl_ddot_k_CORE2
0000000000703a00 T jl_ddot_k_DUNNINGTON
0000000001013400 T jl_ddot_k_NANO
0000000000808c00 T jl_ddot_k_NEHALEM
0000000000932800 T jl_ddot_k_OPTERON
0000000000aa7e00 T jl_ddot_k_OPTERON_SSE3
00000000005de000 T jl_ddot_k_PENRYN
00000000013d8000 T jl_ddot_k_PILEDRIVER
0000000000320600 T jl_ddot_k_PRESCOTT
000000000113c200 T jl_ddot_k_SANDYBRIDGE
so maybe some additional steps are required?
On Windows there is a not-that-hard option that works, by making the following change to this file in OpenBLAS
--- exports/gensymbol 2014-08-11 20:56:12.014049400 -0700
+++ exports/jl_gensymbol 2014-08-11 20:55:22.566221200 -0700
@@ -2833,22 +2833,22 @@
foreach $objs (@underscore_objs) {
$uppercase = $objs;
$uppercase =~ tr/[a-z]/[A-Z]/;
- print "\t$objs=$objs","_ \@", $count, "\n";
+ print "\tjl_$objs=$objs","_ \@", $count, "\n";
$count ++;
- print "\t",$objs, "_=$objs","_ \@", $count, "\n";
+ print "\tjl_",$objs, "_=$objs","_ \@", $count, "\n";
$count ++;
- print "\t$uppercase=$objs", "_ \@", $count, "\n";
+ print "\tjl_$uppercase=$objs", "_ \@", $count, "\n";
$count ++;
}
foreach $objs (@need_2underscore_objs) {
$uppercase = $objs;
$uppercase =~ tr/[a-z]/[A-Z]/;
- print "\t$objs=$objs","__ \@", $count, "\n";
+ print "\tjl_$objs=$objs","__ \@", $count, "\n";
$count ++;
- print "\t",$objs, "__=$objs","__ \@", $count, "\n";
+ print "\tjl_",$objs, "__=$objs","__ \@", $count, "\n";
$count ++;
- print "\t$uppercase=$objs", "__ \@", $count, "\n";
+ print "\tjl_$uppercase=$objs", "__ \@", $count, "\n";
$count ++;
}
@@ -2857,15 +2857,15 @@
$uppercase = $objs;
$uppercase =~ tr/[a-z]/[A-Z]/;
- print "\t",$objs, "_=$objs","_ \@", $count, "\n";
+ print "\tjl_",$objs, "_=$objs","_ \@", $count, "\n";
$count ++;
- print "\t$uppercase=$objs", "_ \@", $count, "\n";
+ print "\tjl_$uppercase=$objs", "_ \@", $count, "\n";
$count ++;
}
foreach $objs (@no_underscore_objs) {
- print "\t",$objs,"=$objs"," \@", $count, "\n";
+ print "\tjl_",$objs,"=$objs"," \@", $count, "\n";
$count ++;
}
My ccall test with a prefixed jl_ddot_
works with a libopenblas.dll generated based on this modification.
@tkelman, does that rename all of the functions or just the generated ones? e.g. we also want to rename functions like openblas_set_num_threads
.
@stevengj it renames everything that's exported from the dll, including openblas_set_num_threads
.
I figured out why objcopy
isn't working. It evidently can't rename dynamic symbols, unless it has learned some new tricks since http://sourceware-org.1504.n7.nabble.com/objcopy-redefine-sym-on-dynsym-section-td119610.html
[tkelman@static-host lib]$ objdump -T libjlopenblas.so | grep ddot
0000000000dbf200 g DF .text 0000000000000591 Base ddot_k_BOBCAT
0000000000aa7e00 g DF .text 0000000000000569 Base ddot_k_OPTERON_SSE3
00000000005de000 g DF .text 0000000000000559 Base ddot_k_PENRYN
0000000001299e00 g DF .text 0000000000000341 Base ddot_k_BULLDOZER
00000000004b4e00 g DF .text 0000000000000551 Base ddot_k_CORE2
0000000000f29200 g DF .text 0000000000000325 Base ddot_k_ATOM
0000000000320600 g DF .text 0000000000000581 Base ddot_k_PRESCOTT
0000000000808c00 g DF .text 0000000000000591 Base ddot_k_NEHALEM
0000000000703a00 g DF .text 0000000000000529 Base ddot_k_DUNNINGTON
00000000000f3aa0 g DF .text 000000000000005d Base ddot_
0000000000932800 g DF .text 000000000000056e Base ddot_k_OPTERON
0000000001013400 g DF .text 0000000000000591 Base ddot_k_NANO
00000000013d8000 g DF .text 0000000000000341 Base ddot_k_PILEDRIVER
0000000000c1ce00 g DF .text 0000000000000591 Base ddot_k_BARCELONA
000000000113c200 g DF .text 0000000000000591 Base ddot_k_SANDYBRIDGE
00000000000f47b0 g DF .text 0000000000000055 Base cblas_ddot
Anyone have any suggestions? I tried messing with some of the CNAME
definitions in OpenBLAS' Makefile.system but that led to several undefined symbols, a bad mix of renamed and not-renamed functions. @xianyi any suggestions for applying a global prefix (or suffix, if that's easier) to all functions exported from the openblas shared library, on Linux and OSX?
Would loading with RTLD_LOCAL help?
@nbecker, this was discussed above. One obstacle to RTLD_LOCAL
seems to be that we are not loading OpenBLAS with dlopen
, but are rather linking libopenblas.so
directly to the julia
executable, so we have to figure out if there is a corresponding linker flag. I did I quick search through the man page of GNU ld
and didn't see anything, but it has a zillion options and it's possible I missed something.
(This problem mainly seems to show up on GNU/Linux, so I think we need something that works with GNU ld
.)
@stevengj I believe we are dlopen
'ing OpenBLAS, albeit implicitly just by ccall
'ing some BLAS function and passing Base.libblas_name
in as the library handle. We could probably explicitly dlopen
libblas in an initialization function somewhere and pass in RTLD_LOCAL
if we want to.
It's definitely been a problem in packages on Macs too. There's an osx.def
file in OpenBLAS which gets created by the same Perl script gensymbol
then linked using -Wl,-exported_symbols_list,osx.def
, I can't really test that though as I don't have a Mac.
I think I found a solution. We can't use objcopy
on the shared library because it can't rename dynamic symbols, but I just tried it on the static library right before linking the .so and that works. It passes my jl_ddot_
test, anyway:
--- exports/Makefile-old 2014-08-20 20:47:51.000000000 -0700
+++ exports/Makefile 2014-08-20 20:45:16.000000000 -0700
@@ -103,7 +103,10 @@
so : ../$(LIBSONAME)
-../$(LIBSONAME) : ../$(LIBNAME) linktest.c
+../$(LIBSONAME) : ../$(LIBNAME) linktest.c aix.def
+ rm -f prefix.def
+ for i in `cat aix.def`; do echo "$$i jl_$$i" >> prefix.def; done
+ objcopy --redefine-syms prefix.def ../$(LIBNAME)
ifneq ($(C_COMPILER), LSB)
$(CC) $(CFLAGS) $(LDFLAGS) -shared -o ../$(LIBSONAME) \
-Wl,--whole-archive ../$(LIBNAME) -Wl,--no-whole-archive \
I'm using aix.def
as a simple list of exported symbols. objcopy --prefix-symbols=jl_ ../$(LIBNAME)
went a little overboard renaming everything in the static library (including things from libm, pthreads, libgfortran, etc), it couldn't link the .so from it afterwards.
Great!
Julia compiles OpenBLAS to
libopenblas.so
. This may be a problem for calling libraries that link to a systemlibopenblas.so
, because the runtime linker may substitute Julia's version instead. The problem is that Julia's version is compiled with a 64-bit interface, which is not the default, and so if an external library calls it expecting a 32-bit interface, a crash may result.We encountered what appears to have been this problem n @alanedelman's machine (julia.mit.edu). He recently started experiencing crashes in
PyPlot.plot
that, with the help of valgrind, I tracked down to apparently:Apparently, Matplotlib is calling OpenBLAS (via NumPy:
_dotblas.c
is a NumPy file) with the 32-bit interface, but is getting linked at runtime into Julia's openblas library, which is compiled with a 64-bit interface. Recompiling Julia and openblas withUSE_BLAS64=0
worked around the problem, but it would be better to avoid the conflict.Can we just rename our
libopenblas.so
file to avoid any possible conflict in the runtime linker?