Closed njsmith closed 4 years ago
Has also been an issue in Matlab where they use the ILP64 MKL, if you try to link a mex file against system LP64 BLAS, for many years before Julia ever existed. ILP64 BLAS only ever worked in Julia if you didn't try to load any other binary applications compiled against system LP64 BLAS. It was fragile until we implemented the renaming. Lots of context in https://github.com/JuliaLang/julia/issues/4923 https://github.com/xianyi/OpenBLAS/issues/646 https://github.com/xianyi/OpenBLAS/pull/459 https://github.com/JuliaLang/julia/pull/8734.
We just append 64_
to the C-style symbol names, so it's dgemm_64_
, or if you try to call it from Fortran (edit: assuming gfortran mangling) it would be dgemm_64
. SunPerf actually had separate API names as long
as 15 years ago, ref http://www.netlib.org/atlas/atlas-comm/msg00233.html - so we went with the suffix that would let us directly use some existing code in Tim Davis' SuiteSparse that gets activated by -DSUN64
.
Is it possible to support dynamic dispatch, wherein both dgemm32
and dgemm64
are in the library, but at runtime dgemm
is mapped to dgemm32
, unless the user does e.g. export BLIS_64B_MAGIC
? That would be cool.
@tkelman Why doesn't Julia just call CBLAS? I thought support for that was pretty broad, at least in contexts where Julia is used.
We do in a handful of places where the Fortran calling convention differs between gfortran and MKL/Accelerate. I don't think cblas typically supports 64 bit indices - I could be wrong though.
@jeffhammond: That would indeed be cool, but unfortunately I don't think there's any way to make that happen given the vagaries of different platforms' linking models...
@tkelman Does CBLAS need to support 64b integers to meet Julia's needs? For BLAS1 operations, it is trivial to chunk if there are more than 2B elements in a vector. Given the memory constrains, if one has a matrix dimension larger than 2B, the BLAS2/3 operations are going to perform like BLAS1 operations and thus chunking shouldn't be a big issue.
@njsmith How many different conventions need to be supported for the feature to have high value to the user community?
It was easier for us to rename the symbols than add chunking around every single blas and lapack call. When openblas is built with ILP64 indices, its cblas also gets built with 64 bit integers afaict, but that's a non default setting and will cause segfaults if you're not careful about symbol names and loading other blas libraries in the same process.
@tkelman Eww, that's terrible of OpenBLAS. They should always use C int
. Someone should file that bug.
They should also always use 32 bit ints for the fortran interface by that reasoning. The problems in practice are identical.
How many different conventions need to be supported for the feature to have high value to the user community?
I think that the use cases that matter in practice are:
dgemm_
and cblas_dgemm
exist and use 32-bit integers (what Debian and others want)dgemm_64_
and cblas_dgemm_64
exist and use 64-bit integers (what Julia wants, also useful for Jeff's modern Fortran codes)dgemm_
exists and uses 64-bit integers (for bespoke setups that want to work with legacy fortran code that calls plain dgemm
and is compiled with 64-bit INTEGER
).For the last case I'm not sure if anyone ever needs cblas_dgemm
to be 64-bit, but I suppose it might be useful sometimes, and MKL seems to provide this as an interface ("All Intel MKL function domains support ILP64 programming but FFTW interfaces", so I guess that includes CBLAS?), so it could be useful for people porting code from MKL, and it's not hard to support.
If we combine the two compatible 64-bit options, this gives us a total of 3 configurations that are important to test/support:
dgemm_
/cblas_dgemm
and 64-bit dgemm_64_
/cblas_dgemm_64
dgemm_
/cblas_dgemm
In principle BLIS certainly could retain the flexibility to support other configurations -- like 32-bit integers internally on 64-bit systems, or 64-bit BLIS + 32-bit dgemm_
and no 64-bit dgemm_
, etc. etc., for all the combinatoric possibilities -- but trying to test and support all these seems like a waste of effort to me, since almost all of them are irrelevant in practice.
@jeffhammond: The above list does assume that for your "modern fortran codes" use case, you don't actually care about dgemm_32
and really just want dgemm_64
. Is that correct?
I completely agree with @njsmith's summary there.
There's the question of what to do on CBLAS as well. In the OpenBLAS scenario as used by Julia, we have a handful of CBLAS symbols that we use (just cblas_[cz]dot[uc]_sub
), but we use them with 64 bit integers and renamed with a 64_
suffix accordingly. cblas_cdotu_sub64_
looks a little funny but the suffix is handled by a macro anyway. I believe people are successfully building Julia against ILP64 MKL and also using those same cblas symbols but without a suffix, with a BlasInt
type of 64 bits and it's been working fine. I'd have to check whether that's just luck, or if MKL also changes CBLAS integer sizes when you link (or set the environment variable for the dynamic runtime) in ILP64 mode.
Edited my comment to speak more explicitly about CBLAS -- specifically, I think cblas_dgemm_64
and friends should obviously be supported for all the same reasons we want dgemm_64_
, and a look at the MKL docs suggests that their IPL64 builds do provide 64-bit cblas_dgemm
and friends.
Oh, ugh, except I missed that Julia's 64-bit cblas functions are named like cblas_dgemm64_
instead of cblas_dgemm_64
, which is obviously wrong. @tkelman, does fixing this seem at all likely? :-)
It's not "wrong" per se, just a consequence of how we implemented it. We apply a suffix uniformly to all symbols in the library, after the gfortran mangling rather than before. Looks funny, but just easier to deal with.
Jeez, sorry about the spam, my phone is stupid
@tkelman: I guess by "wrong" i mean "if we were writing a standard from scratch this is obviously not what we would do". Since now we are talking about making a standard and BLIS implementing it, there's a question about whether we should follow the dorky existing thing, or fix the existing thing and then implement that :-). I guess you're voting for following the existing thing?
"add a suffix, but before the trailing underscore for symbols that are pretending to be from fortran" (or "add 64_
for f-blas symbols and _64
for c-blas symbols") is a more fiddly rule to implement than "add the same suffix to all symbols." ILP64 is generally a situation where you should know what you are doing, or leave the linking and renaming to someone who does, so I don't care too much what the result looks like. It's almost always going to be handled by preprocessor macros anyway.
I'm not seeing much consensus here. If someone wants to "create a standard" then we can try to implement it.
The de facto standard is that the standard BLAS/CBLAS functions take 32-bit integers in their API. Julia experimented with changing this so that they could use 64-bit integers in their main BLAS wrappers, and this worked great for a little while until they discovered that when people started trying to link in other existing BLAS-using code, this code was assuming that BLAS uses 32-bit integers and was causing segfaults. Their solution was to continue to use a 64-bit integer version of BLAS, but with symbols renamed to avoid collisions (so e.g.
dgemm_
uses 32-bit integers, anddgemm_64_
uses 64-bit integers... [edited to get the 64-bit symbol names correct])As mentioned in https://github.com/flame/blis/pull/37#issuecomment-190572923 , it would be great if a single BLIS library could export both 32- and 64-bit versions of these symbols simultaneously. It doesn't look like this would be too hard, since both the BLAS2BLIS interface is already generated using C preprocessor magic, and the CBLAS wrapper is already getting programmatically patched...