JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.39k stars 5.46k forks source link

ABI conflicts due to 64-bit libopenblas.so #4923

Closed stevengj closed 9 years ago

stevengj commented 10 years ago

Julia compiles OpenBLAS to libopenblas.so. This may be a problem for calling libraries that link to a system libopenblas.so, because the runtime linker may substitute Julia's version instead. The problem is that Julia's version is compiled with a 64-bit interface, which is not the default, and so if an external library calls it expecting a 32-bit interface, a crash may result.

We encountered what appears to have been this problem n @alanedelman's machine (julia.mit.edu). He recently started experiencing crashes in PyPlot.plot that, with the help of valgrind, I tracked down to apparently:

==17855== Use of uninitialised value of size 8
==17855==    at 0xA8B6890: dgemm_beta_NEHALEM (in /home/edelman/julia/usr/lib/libopenblas.so)
==17855==    by 0xA082D72: dgemm_nn (in /home/edelman/julia/usr/lib/libopenblas.so)
==17855==    by 0x9F558C8: cblas_dgemm (in /home/edelman/julia/usr/lib/libopenblas.so)
==17855==    by 0x16430CA5: dotblas_matrixproduct (_dotblas.c:809)
==17855==    by 0x14BAB5D4: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)

Apparently, Matplotlib is calling OpenBLAS (via NumPy: _dotblas.c is a NumPy file) with the 32-bit interface, but is getting linked at runtime into Julia's openblas library, which is compiled with a 64-bit interface. Recompiling Julia and openblas with USE_BLAS64=0 worked around the problem, but it would be better to avoid the conflict.

Can we just rename our libopenblas.so file to avoid any possible conflict in the runtime linker?

nbecker commented 10 years ago

If it is true that libopenblas is linked via dlopen (and I believe that is a correct statement), then in my opinion using RTLD_LOCAL is a lot cleaner solution.

On Thu, Aug 21, 2014 at 7:59 AM, Steven G. Johnson <notifications@github.com

wrote:

Great!

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4923#issuecomment-52910236.

stevengj commented 10 years ago

We don't have an explicit call to dlopen in Julia. My recollection was that @JeffBezanson wanted to avoid replacing ccalls with explicit dlopen calls in order to ease eventual static compilation. Jeff, do you have an opinion here?

mlubin commented 10 years ago

Well here's something to think about: if openblas were statically compiled into julia, is it possible to hide the symbols like with RTLD_LOCAL? If not, then there's really no choice but to rename the symbols.

nbecker commented 10 years ago

Maybe this ld option?

   --exclude-libs lib,lib,...
       Specifies a list of archive libraries from which symbols should

not be automatically exported. The library names may be delimited by commas or colons. Specifying "--exclude-libs ALL" excludes symbols in all archive libraries from automatic export. This option is available only for the i386 PE targeted port of the linker and for ELF targeted ports. For i386 PE, symbols explicitly listed in a .def file are still exported, regardless of this option. For ELF targeted ports, symbols affected by this option will be treated as hidden.

On Thu, Aug 21, 2014 at 11:48 AM, Miles Lubin notifications@github.com wrote:

Well here's something to think about: if openblas were statically compiled into julia, is it possible to hide the symbols like with RTLD_LOCAL? If not, then there's really no choice but to rename the symbols.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4923#issuecomment-52939563.

nbecker commented 10 years ago

A brief look at the src code, it looks like openblas is loaded via ccall. I was thinking perhaps an optional flag to ccall to pass RTLD_LOCAL?

On Thu, Aug 21, 2014 at 11:59 AM, Neal Becker ndbecker2@gmail.com wrote:

Maybe this ld option?

   --exclude-libs lib,lib,...
       Specifies a list of archive libraries from which symbols should

not be automatically exported. The library names may be delimited by commas or colons. Specifying "--exclude-libs ALL" excludes symbols in all archive libraries from automatic export. This option is available only for the i386 PE targeted port of the linker and for ELF targeted ports. For i386 PE, symbols explicitly listed in a .def file are still exported, regardless of this option. For ELF targeted ports, symbols affected by this option will be treated as hidden.

On Thu, Aug 21, 2014 at 11:48 AM, Miles Lubin notifications@github.com wrote:

Well here's something to think about: if openblas were statically compiled into julia, is it possible to hide the symbols like with RTLD_LOCAL? If not, then there's really no choice but to rename the symbols.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4923#issuecomment-52939563.

staticfloat commented 10 years ago

If Jeff wants to cut down on dlopen and just use ccall to implicitly open them, I believe that's just so that when we do have the infrastructure, we can forgo dlopening at all, as everything will be statically linked together, and the dlopen call itself could fail.

In that case, one could imagine special flags to be passed to dlopen() to hint to the static compiler that this dlopen() call can be ignored during static compilation, or even better, to hint to the static compiler that the symbols being imported from this library should not be exported! Since during static compilation we could reintroduce the problem without that knowledge. In either case, until we get static compilation and know the requirement, I don't think we should take using dlopen off the table. (Unless of course, Jeff shows up in this thread and proves me wrong!)

stevengj commented 10 years ago

@nbecker, I think the --exclude-libs might prevent us from calling the functions dynamically (hence from Julia) at all, unless each and every ccall to BLAS is compiled statically.

nbecker commented 10 years ago

I mentioned --exclude-libs to address static linking, which I thought Miles Lubin had suggested.

That is, I thought the suggestion was to build julia statically linking to libopenblas.a. In that case, it sounded like --exclude-libs might be useful.

On Thu, Aug 21, 2014 at 12:18 PM, Steven G. Johnson < notifications@github.com> wrote:

@nbecker https://github.com/nbecker, I think the --exclude-libs will prevent us from calling the functions dynamically (i.e. from Julia) at all.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4923#issuecomment-52943930.

tkelman commented 10 years ago

Statically linking openblas will make the julia binaries huge, I don't think that's a serious option. I'd prefer doing something that works the same way across all platforms. Using 64-bit integers in openblas by default was a bit of a cavalier choice in terms of compatibility with other libraries, and unless we want to reverse that choice we need to do something to mitigate the compatibility problems. (Yes Matlab made the same choice, but ask anyone who builds mex files that depend on blas, this same issue is a big problem there too.)

So can someone test if tweaking gensymbol and/or osx.def to add prefixes on the exported symbols works on Mac too?

staticfloat commented 10 years ago

@tkelman On OSX, the generated .def file isn't a mapping, it's just a list of symbols. So changing the list of symbols doesn't change much, unfortunately. Unless I'm misunderstanding something.

tkelman commented 10 years ago

Oh. carp. Can you try a similar patch to the above https://github.com/JuliaLang/julia/issues/4923#issuecomment-52875904, but on these OSX lines instead? https://github.com/xianyi/OpenBLAS/blob/a69dd3fbc5c38f7098d1539a69963c0d2bd3163a/exports/Makefile#L96-L97

I'm not sure whether you should use osx.def as a base (with the leading underscore on everything), or aix.def (without leading underscores).

staticfloat commented 10 years ago

We don't have objcopy on OSX. :P Le sigh.

On the plus side, I found a really neat tiny utility called objconv that I think will make our lives easier. It can manipulate PE, ELF and Mach-O, and it even has a method to replace the prefix of all symbols with a different prefix. After compiling it, (which was a refreshing exercise in simplicity), I applied this patch to OpenBLAS:

diff --git a/exports/Makefile b/exports/Makefile
index c798bc7..08f413a 100644
--- a/exports/Makefile
+++ b/exports/Makefile
@@ -93,8 +93,18 @@ libopenblas.def : gensymbol
 libgoto_hpl.def : gensymbol
        perl ./gensymbol win2khpl $(ARCH) dummy $(EXPRECISION) $(NO_CBLAS) $(NO_LAPACK) $(NO_LAPACKE) $(NEED2UNDERSCORES) $(ONLY_CBLAS)

-$(LIBDYNNAME) : ../$(LIBNAME) osx.def
-       $(FC) $(FFLAGS) -all_load -headerpad_max_install_names -install_name $(CURDIR)/../$(LIBDYNNAME) -dynamiclib -o ../$(LIBDYNNAME)
+../$(LIBNAME).patched: ../$(LIBNAME) osx.def
+       # Build parameter file for objconv
+       rm -f objconf.params
+       for i in `cat osx.def`; do \
+               echo "-nr:$$i:_jl$$i" >> objconf.params; \
+       done
+       objconv @objconf.params ../$(LIBNAME) ../$(LIBNAME).patched
+
+$(LIBDYNNAME) : ../$(LIBNAME).patched osx.def
+       # We want to avoid the LAPACK symbols stuff
+       sed -e 's/.*/_jl&/' osx.def | grep -v LAPACK > osx.def.patched
+       $(FC) $(FFLAGS) -all_load -headerpad_max_install_names -install_name $(CURDIR)/../$(LIBDYNNAME) -dynamiclib -o ../$(LIBDYNNAME)

 dllinit.$(SUFFIX) : dllinit.c
        $(CC) $(CFLAGS) -c -o $(@F) -s $<
stevengj commented 10 years ago

That seems workable; I guess objconv could be added to deps as a build dependency. (It is GPL, but that is irrelevant here since we aren't actually linking objconv into Julia, just using its output.)

staticfloat commented 10 years ago

Yeah, it's just like patchelf.

tkelman commented 10 years ago

Okay, so have we determined the way to move forward here?

  1. Incorporate objconv as an osx-only dependency
  2. Patch openblas using some combination of the above snippets to add prefixes on all symbols in the BLAS64 case (I think we want to prefix even the lapacke stuff too - we may not be using those but someone will eventually want to ccall some library that does, expecting 32-bit-ints)
  3. Write a macro to prefix all blas and lapack symbols used in ccalls in Base, but only when we're using a 64-bit-int openblas that we know we built from source
ViralBShah commented 10 years ago

That seems like the way to move forward here. We can use this trick for openlibm too.

stevengj commented 10 years ago

Why osx-only? We need to rename the symbols on Linux too.

ViralBShah commented 10 years ago

On Linux, we already have patchelf. So step 1 is done. What about windows?

tkelman commented 10 years ago

Not patchelf, we use objcopy here for Linux. On Windows it was sufficient to patch the gensymbol perl script.

nalimilan commented 9 years ago

As if this thread was not complex enough yet, I'd like to add the use case of distribution packages to the list. :-)

On Fedora for example, ILP64 OpenBLAS is in a separate library called libopenblas64.so. But there's little chance the symbols in this file will be added a prefix to distinguish them from their LP64 counterpart, as Julia is not the only user of that package. A standard prefix (like 64) could be applied upstream, but then it would mean programs could not easily switch between OpenBLAS and MKL (not all languages have macros as flexible as Julia). A solution would be to build two versions of ILP64 OpenBLAS, one with standard names, and one with the prefix, so that all programs are happy, but this would entail a large amount of duplication (and there are already 2 x 3 copies of OpenBLAS, for 32/64-bit, and for serial, OpenMP and pthreads).

Admittedly, this is also upstream's and distributions' task to make sure ILP64 and LP64 libraries can happily cohabit. Since this issue does not only affect Julia, shouldn't something be done in coordination with upstream and distributors?

(That doesn't mean the fix suggested above isn't useful for other contexts.)

tkelman commented 9 years ago

Are any distribution packages for Julia using ILP64 openblas? I wasn't aware that any distributions had ILP64 blas packages. Is there a way to do a reverse-dep search to get a rough survey, within the distributions that have ILP64 blas implementations packaged and available, what other client packages are making use of them?

More coordination absolutely makes sense. This is a major problem that cuts across multiple distributions, operating systems other than Linux, programming languages, and use cases. I don't think there's any sane way for ILP64 and LP64 to coexist that satisfies every possible combination here - it's almost impossible to know ahead of time that there will never be someone who wants to combine functionality from a library that decided to use ILP64 with a library that didn't. Unless you want to introduce the burden of requiring that every ILP64 library also provide a separate LP64 implementation (my guess is most of them are familiar enough with this issue that they already are, even if not required to...).

I've personally been starting to think that ILP64 BLAS is more trouble than it's worth. Once you get up to multiple gigabytes of data and you want to do dense linear algebra, even just BLAS1 (which can mostly be done equivalently in Julia anyway) on a huge vector, you're probably better off working in distributed memory and figuring out how to partition your data more sanely so you don't have to think about all of it at once. Is there such a crazy person doing BLAS2, BLAS3, or LAPACK in a single shared memory space with arrays whose dimensions are larger than 32 bits?

eschnett commented 9 years ago

It is common for libraries to export their functions under multiple names. This is usually done via "weak symbols" or so, and does not require any code duplication. For example, name mangling for Fortran is not standardized, and many Fortran libraries export 2 or 3 different names for each function (e.g. "DGEMM", "dgemm_" and "dgemm__").

nalimilan commented 9 years ago

@tkelman ILP64 OpenBLAS has been added recently to Fedora, in part on my request. Apparently no package uses it yet, so it may still be time to fix things. I think other distributions do not provide it.

We can probably find a solution with @xianyi, but the problem is that MKL already seems to use identical symbols for LP64 and ILP64 (is that right?), and I guess it will be hard to get them to release an additional version with modified symbols. Though they may accept @eschnett's "weak symbols" solution if it's considered standard enough.

Regarding the need for ILP64, using it by default might not be the utmost priority, but I've seen on the Web several people requiring ILP64 BLAS, for example SuiteSparse's author Tim Davis here:

You might wonder if I would be insane enough to contemplate a matrix larger than 2^32 by 2^32. I'm not. When using the BLAS in an unsymmetric sparse factorization code, you can get very tall and thin (or short and squat) dense submatrices, where just one of the dimensions m or n is larger than 2^31 (k, in my case, is limited to a small constant in dgemm). The total problem size could still be just a handful of GBytes (but more than 4GB), even if one of m or n (but not both) in a call to dgemm is larger than 2^31.

http://www.netlib.org/atlas/atlas-comm/msg00233.html

tkelman commented 9 years ago

The more important part of that quote (from a surprisingly long time ago, 2001) is this:

in the Sun Performance Library, there is a 64-bit routine:

void dgemm_64

Someone at Sun was thinking ahead. Shame they probably weren't the same ones running the business side of things, but I digress.

MKL already seems to use identical symbols for LP64 and ILP64 (is that right?)

Looking at nm /opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64/libmkl_intel_ilp64.a | grep 64, the Pardiso and FFTW symbol names have either _64 or _ilp64 suffixes on them, the typical blas/lapack symbols do not. (Edit: misread the output of nm, Pardiso symbols have the same suffix in the _lp64.a library - the object names also have suffixes, but only a few of the symbols do) However MKL is not included in any Linux distributions, the general assumption when you use MKL is that you have to recompile everything against MKL. Leading to ridiculous situations like this, in Python land: http://www.lfd.uci.edu/~gohlke/pythonlibs/

I don't know whether weak symbols will work cross-platform. What happens to the original names when you use weak symbols? If it's still possible to get name conflicts if the original names are still exported, I'm not sure if that approach helps.

nalimilan commented 9 years ago

Good to hear! So if Intel has already found a naming convention (or even two) for ILP64 MKL, then upstream OpenBLAS could simply use the same names, or advise packagers to do so -- either with weak symbols (if that works), or by making it easy to completely rename functions with a compile-time flag. @xianyi How do you feel about that?

tkelman commented 9 years ago

No, I was misreading the output of nm and I think you misinterpreted what I said. MKL does not have separate naming conventions, by and large. Sun did, but Sun has gone the way of the dodo - apparently you can still buy SunPerf from Oracle, but does anyone? http://docs.oracle.com/cd/E24457_01/html/E21987/gkezy.html. In a few small places MKL has naming conventions, but only in spblas or other features that OpenBLAS does not provide. For dense blas and lapack, MKL allows you to switch between ILP64 and LP64 without having to change function names, only integer types (you'll need to recompile and re-link). So the symbols do conflict and you should absolutely never have an application (or more likely two separate unrelated parts that some other application wants to use in combination) that tries to use ILP64 and LP64 at the same time.

A compile-time flag for adding a prefix or suffix would be ideal, that's what we asked for, but it's looking like we have to figure out how to do it ourselves across all platforms we care about. We mostly have figured out a hacky way that should work, but it requires adding extra dependencies that only apply on a system that I don't have (why you gotta suck, osx binutils?) so it's tough for me to make much progress there.

nalimilan commented 9 years ago

Ah, sorry, I thought that by "typical" you meant LP64. So the problem (only considering the case of distribution packages, and even ignoring technical issues on OS X) is that a compile-time prefix would make ILP64 OpenBLAS completely incompatible with all other BLASes, meaning that very few programs will switch to it. Or it will force shipping two copies with different symbol names.

Let's ask @susilehtola, the maintainer of OpenBLAS in Fedora.

tkelman commented 9 years ago

a compile-time prefix would make ILP64 OpenBLAS completely incompatible with all other BLASes

Yes, a compile-time prefix would introduce an API incompatibility in the function names, not just their types. So for pieces of software that are set up to easily switch their integer types (which is far from all pieces of software...) but not set up to easily change the function names with which they call blas, this would make it harder to use the ILP64 blas. However my argument is this is a feature not a bug, since changing the integer type without changing the function name introduces far more subtle ABI incompatibilities, which may only be exhibited at runtime by some entirely separate application.

Julia works just fine with ILP64 openblas when all integers are internally 64 bit, but then you try to call Ipopt from that same process if Ipopt is linked to a conventional shared-library LP64 blas? Segfaults. Same exact thing happens in Matlab which uses ILP64 MKL without changing the symbol names.

Or it will force shipping two copies with different symbol names.

I don't think this is a good idea. People who use an ILP64 blas should really know exactly what they're doing and be well aware that leaving the symbol names alone makes their library impossible to use in combination with a huge amount of pre-existing numerical code (unless things are carefully statically linked, which is not how Julia works).

Let's ask @susilehtola, the maintainer of OpenBLAS in Fedora.

Yes, getting more input would be useful, sorry for the walls of text.

susilehtola commented 9 years ago

Uhm.. What do you want to ask?

nalimilan commented 9 years ago

@susilehtola Sorry, the thread is quite long. The relevant part starts at https://github.com/JuliaLang/julia/issues/4923#issuecomment-57147655. Basically, we're wondering whether it would make sense to rename all symbols in the ILP64 OpenBLAS library (e.g. adding a 64 suffix), so that programs which link to it do not crash if they also use library which links to the LP64 version.

susilehtola commented 9 years ago

Duplicating system libraries is nasty stuff and forbidden in linux distributions.

susilehtola commented 9 years ago

So if you were to do that, it would be reverted at least in Fedora.

stevengj commented 9 years ago

@susilehtola, if we linked with a system OpenBLAS, or some other BLAS implementation (e.g. MKL), we would not add the 64 suffix (via a macro in the Julia code). So, this wouldn't affect Fedora packages.

(For a Julia in Fedora distro and linked to the Fedora BLAS, this is not really an issue because presumably on that system all libraries are linked to the same BLAS. The problem arises when people have multiple ABI-incompatible BLAS implementations on the same machine, e.g. one from the julialang.org binary download and one from their distro, and then libraries get confused.)

I think this discussion has gotten a bit off-track: we really only need the suffix for the case when we are distributing/compiling an ILP64 OpenBLAS ourselves.

susilehtola commented 9 years ago

Well, the issue is still in distributions as well. For instance in Fedora you have reference BLAS/LAPACK, ATLAS and OpenBLAS, all of which ship the same symbols for API compatibility. And, for reference BLAS/LAPACK and OpenBLAS, also 64-bit interface versions exist.

All of these can cause unpredictable behavior and crashes if mixed together.

This is just an unfortunate issue with the numerical libraries. The calls to BLAS/LAPACK functions really should be translated at compile time to calls to implementation specific functions, as has been suggested above. But, this is really wishing for too much.

StefanKarpinski commented 9 years ago

Is there some approach that you can suggest that would help to resolve this problem?

stevengj commented 9 years ago

My feeling is that we should rename the symbols when making our own binaries, and not worry too much about Fedora etcetera (where we will use whatever ABI they want).

(We have to support both suffixed and non-suffixed ABIs anyway because people want to use MKL, so there will be a Makefile switch and a corresponding macro in Base.)

Making sure that programs are linked to consistent BLAS libraries really seems like a distro issue to me. Fedora should make a decision about which BLAS ABI they want their scientific libraries (including NumPy) linked to, and be consistent. Then whatever choice they adopt can be used for their Julia package too.

nalimilan commented 9 years ago

@stevengj See https://fedorahosted.org/fpc/ticket/352 for a debate about how to handle the BLAS/LAPACK mess in Fedora. tl;dr: it's already hard enough to get everybody agree on a scheme for fully-compatible LP64 BLAS that I don't think it would be easy to move all packages to ILP64 (which would break ABI...). So I agree Julia should find a solution for when BLAS is bundled, while we try to find another solution for distributions.

@susilehtola Do you see any path forward if we intend to move as many libraries as possible to ILP64 in Fedora? And to make them cohabit without crashing?

tkelman commented 9 years ago

for reference BLAS/LAPACK and OpenBLAS, also 64-bit interface versions exist

If Fedora is starting to distribute ILP64 openblas and/or reference blas and lapack, this ABI issue is a major problem and should be worked out earlier rather than later. If you aren't going to change the symbol names, in my opinion the responsible thing for distributions to do is to mark ILP64 blas/lapack (any implementation) as conflicting with LP64 blas/lapack, so they cannot be simultaneously installed.

if we intend to move as many libraries as possible to ILP64

I think that's overly optimistic. Julia's blas and lapack interface code is nicely modularized and easily configurable to use different integer sizes, but that's not the case everywhere (maybe for code that is 100% Fortran, but is there such a thing any more?). Ipopt for example interfaces to external code written by a variety of authors in C, C++, Fortran 77, and Fortran 90. I'm not aware of any demand (except perhaps from Julia) to add a configuration option for using ILP64 blas with Ipopt, certainly not to the extent of anyone stepping up to write the very long invasive patch that would require.

nalimilan commented 9 years ago

Has anybody considered using symbol versioning? It's precisely made to allow loading incompatible ABIs in the same process, without changing the symbol names at all. The default ABI version could be called lp64 or 32, and another version would be ilp64 or 64; thus, applications designed for the LP64 BLAS would work fine. Julia would use dlvsym(handle, symbol, "ilp64") to get the interface it wants.

That would require building LP64 and ILP64 versions of OpenBLAS in the same library, not sure how hard that would be. One drawback is that only the GNU, BSD and Solaris linkers support symbol versioning.

tkelman commented 9 years ago

@nalimilan symbol versioning might be an okay solution for linux distributions with this issue. How does Julia's ccall interact with symbol versioning though? And as I've mentioned this is an issue on Mac as well, and I'd prefer whatever choice we make in Julia to be as uniform across platforms as we can (despite differences in patches / build process to get there).

nalimilan commented 9 years ago

@nalimilan symbol versioning might be an okay solution for linux distributions with this issue. How does Julia's ccall interact with symbol versioning though?

I'm not very clear on how ccall works, but as I said above it should be possible to request a non-default interface (here, ILP64) using dlvsym.

And as I've mentioned this is an issue on Mac as well, and I'd prefer whatever choice we make in Julia to be as uniform across platforms as we can (despite differences in patches / build process to get there).

Yes, that would be much better, but so far I don't see a completely portable solution. The solution of adding 64 to all symbols could be used on all platforms as a fallback. And Linux distribution packages could use the symbol versioning approach: since it does not require adding the prefix to all function calls (just mentioning the ABI version you want once, for compiled calls), it will be much more suitable if other programs than Julia want to link to the ILP64 BLAS.

@susilehtola Any thoughts on this scenario?

tkelman commented 9 years ago

I'm having a look at implementing this. objconv (http://www.agner.org/optimize/#objconv) apparently doesn't upload versioned source files, and it's in the wonderful format of a zip file within another zip file. The last change was just a couple weeks ago, so using an unversioned url would constantly flag checksum mismatches. Should we just rehost the source? (edit: nevermind, checksumming is nice to do when possible but it's not really mandatory) There might also be a way to use ld to achieve something similar via aliases? See http://stackoverflow.com/a/11951756 - can someone with a mac try that out?

I also checked where SuiteSparse links to BLAS/LAPACK functions, and there's already a bit of code there for the 64-bit SunPerf BLAS with functions suffixed by _64. See deps/SuiteSparse-4.3.1/CHOLMOD/Include/cholmod_blas.h. Looks like if we adopt that suffix, or patch that section of code (also in umfpack and spqr) to use our own prefix/suffix, we can just set -DSUN64 (or patch it to use our own define) and SuiteSparse should work. Arpack will be much messier, unfortunately. How's the replacing arpack effort coming along?

ViralBShah commented 9 years ago

@jiahao can say when we can replace ARPACK. We have added lots of functionality around ARPACK, and fixed all the issues. We will soon have svds too, and a few other things that will make it feature complete.

It is not too tough to patch ARPACK to use 64-bit BLAS/LAPACK with a _64 suffix.

tkelman commented 9 years ago

While I work on writing the patch for this, which approach would folks prefer?

  1. Suffix all symbols by 64_, so we can use -DSUN64 without patching SuiteSparse, and make the macro Julia-side more uniform.
  2. Suffix all symbols by _64, but before any trailing underscores in the function name. Can use SuiteSparse unpatched here, but the Julia-side macro would have to be a little more complicated.
  3. Use a prefix like ilp64_ or jl_, would have to patch SuiteSparse for this.
nalimilan commented 9 years ago

In the absence of arguments in favor of 2 and 3, why not use the -DSUN64 convention? If it can help establishing a standard, it would be a good thing, and other projects are more likely to accept supporting this if Julia is not the only project using this convention.

tkelman commented 9 years ago

The only argument in favor of 2 or 3 would be that cblas_ddot64_ or openblas_set_num_threads64_ look a little funny.

nalimilan commented 9 years ago

Yeah, weird idea... It's also true that searching for e.g. dgemv_64_ doesn't give results other than the SuiteSparse file, so it doesn't look so popular.

tkelman commented 9 years ago

And people using ILP64 BLAS libraries from Fortran have to worry about compiler-dependent name mangling.

Anyway, I've got step 2 from https://github.com/JuliaLang/julia/issues/4923#issuecomment-54724314 mostly done, I think I'll post a WIP PR soon so people can look at it.

stevengj commented 9 years ago

Since there is no technical reason to prefer one suffix over another as far as I can see, any little thing tips the scales, so I would go with the SUN64 convention.

nbecker commented 9 years ago

A good read on usage of versioned elf shared libs:

http://www.akkadia.org/drepper/dsohowto.pdf

On Sun, Oct 19, 2014 at 10:56 AM, Steven G. Johnson < notifications@github.com> wrote:

Since there is no technical reason to prefer one suffix over another as far as I can see, any little thing tips the scales, so I would go with the SUN64 convention.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4923#issuecomment-59652269.

Those who don't understand recursion are doomed to repeat it