flame / blis

BLAS-like Library Instantiation Software Framework
Other
2.3k stars 367 forks source link

BLIS should allow simultaneously exporting both 32- and 64-bit variants of BLAS/CBLAS #43

Closed njsmith closed 4 years ago

njsmith commented 8 years ago

The de facto standard is that the standard BLAS/CBLAS functions take 32-bit integers in their API. Julia experimented with changing this so that they could use 64-bit integers in their main BLAS wrappers, and this worked great for a little while until they discovered that when people started trying to link in other existing BLAS-using code, this code was assuming that BLAS uses 32-bit integers and was causing segfaults. Their solution was to continue to use a 64-bit integer version of BLAS, but with symbols renamed to avoid collisions (so e.g. dgemm_ uses 32-bit integers, and dgemm_64_ uses 64-bit integers... [edited to get the 64-bit symbol names correct])

As mentioned in https://github.com/flame/blis/pull/37#issuecomment-190572923 , it would be great if a single BLIS library could export both 32- and 64-bit versions of these symbols simultaneously. It doesn't look like this would be too hard, since both the BLAS2BLIS interface is already generated using C preprocessor magic, and the CBLAS wrapper is already getting programmatically patched...

tkelman commented 8 years ago

Has also been an issue in Matlab where they use the ILP64 MKL, if you try to link a mex file against system LP64 BLAS, for many years before Julia ever existed. ILP64 BLAS only ever worked in Julia if you didn't try to load any other binary applications compiled against system LP64 BLAS. It was fragile until we implemented the renaming. Lots of context in https://github.com/JuliaLang/julia/issues/4923 https://github.com/xianyi/OpenBLAS/issues/646 https://github.com/xianyi/OpenBLAS/pull/459 https://github.com/JuliaLang/julia/pull/8734.

We just append 64_ to the C-style symbol names, so it's dgemm_64_, or if you try to call it from Fortran (edit: assuming gfortran mangling) it would be dgemm_64. SunPerf actually had separate API names as long as 15 years ago, ref http://www.netlib.org/atlas/atlas-comm/msg00233.html - so we went with the suffix that would let us directly use some existing code in Tim Davis' SuiteSparse that gets activated by -DSUN64.

jeffhammond commented 8 years ago

Is it possible to support dynamic dispatch, wherein both dgemm32 and dgemm64 are in the library, but at runtime dgemm is mapped to dgemm32, unless the user does e.g. export BLIS_64B_MAGIC? That would be cool.

jeffhammond commented 8 years ago

@tkelman Why doesn't Julia just call CBLAS? I thought support for that was pretty broad, at least in contexts where Julia is used.

tkelman commented 8 years ago

We do in a handful of places where the Fortran calling convention differs between gfortran and MKL/Accelerate. I don't think cblas typically supports 64 bit indices - I could be wrong though.

njsmith commented 8 years ago

@jeffhammond: That would indeed be cool, but unfortunately I don't think there's any way to make that happen given the vagaries of different platforms' linking models...

jeffhammond commented 8 years ago

@tkelman Does CBLAS need to support 64b integers to meet Julia's needs? For BLAS1 operations, it is trivial to chunk if there are more than 2B elements in a vector. Given the memory constrains, if one has a matrix dimension larger than 2B, the BLAS2/3 operations are going to perform like BLAS1 operations and thus chunking shouldn't be a big issue.

@njsmith How many different conventions need to be supported for the feature to have high value to the user community?

tkelman commented 8 years ago

It was easier for us to rename the symbols than add chunking around every single blas and lapack call. When openblas is built with ILP64 indices, its cblas also gets built with 64 bit integers afaict, but that's a non default setting and will cause segfaults if you're not careful about symbol names and loading other blas libraries in the same process.

jeffhammond commented 8 years ago

@tkelman Eww, that's terrible of OpenBLAS. They should always use C int. Someone should file that bug.

tkelman commented 8 years ago

They should also always use 32 bit ints for the fortran interface by that reasoning. The problems in practice are identical.

njsmith commented 8 years ago

How many different conventions need to be supported for the feature to have high value to the user community?

I think that the use cases that matter in practice are:

For the last case I'm not sure if anyone ever needs cblas_dgemm to be 64-bit, but I suppose it might be useful sometimes, and MKL seems to provide this as an interface ("All Intel MKL function domains support ILP64 programming but FFTW interfaces", so I guess that includes CBLAS?), so it could be useful for people porting code from MKL, and it's not hard to support.

If we combine the two compatible 64-bit options, this gives us a total of 3 configurations that are important to test/support:

In principle BLIS certainly could retain the flexibility to support other configurations -- like 32-bit integers internally on 64-bit systems, or 64-bit BLIS + 32-bit dgemm_ and no 64-bit dgemm_, etc. etc., for all the combinatoric possibilities -- but trying to test and support all these seems like a waste of effort to me, since almost all of them are irrelevant in practice.

njsmith commented 8 years ago

@jeffhammond: The above list does assume that for your "modern fortran codes" use case, you don't actually care about dgemm_32 and really just want dgemm_64. Is that correct?

tkelman commented 8 years ago

I completely agree with @njsmith's summary there.

There's the question of what to do on CBLAS as well. In the OpenBLAS scenario as used by Julia, we have a handful of CBLAS symbols that we use (just cblas_[cz]dot[uc]_sub), but we use them with 64 bit integers and renamed with a 64_ suffix accordingly. cblas_cdotu_sub64_ looks a little funny but the suffix is handled by a macro anyway. I believe people are successfully building Julia against ILP64 MKL and also using those same cblas symbols but without a suffix, with a BlasInt type of 64 bits and it's been working fine. I'd have to check whether that's just luck, or if MKL also changes CBLAS integer sizes when you link (or set the environment variable for the dynamic runtime) in ILP64 mode.

njsmith commented 8 years ago

Edited my comment to speak more explicitly about CBLAS -- specifically, I think cblas_dgemm_64 and friends should obviously be supported for all the same reasons we want dgemm_64_, and a look at the MKL docs suggests that their IPL64 builds do provide 64-bit cblas_dgemm and friends.

njsmith commented 8 years ago

Oh, ugh, except I missed that Julia's 64-bit cblas functions are named like cblas_dgemm64_ instead of cblas_dgemm_64, which is obviously wrong. @tkelman, does fixing this seem at all likely? :-)

tkelman commented 8 years ago

It's not "wrong" per se, just a consequence of how we implemented it. We apply a suffix uniformly to all symbols in the library, after the gfortran mangling rather than before. Looks funny, but just easier to deal with.

tkelman commented 8 years ago

Jeez, sorry about the spam, my phone is stupid

njsmith commented 8 years ago

@tkelman: I guess by "wrong" i mean "if we were writing a standard from scratch this is obviously not what we would do". Since now we are talking about making a standard and BLIS implementing it, there's a question about whether we should follow the dorky existing thing, or fix the existing thing and then implement that :-). I guess you're voting for following the existing thing?

tkelman commented 8 years ago

"add a suffix, but before the trailing underscore for symbols that are pretending to be from fortran" (or "add 64_ for f-blas symbols and _64 for c-blas symbols") is a more fiddly rule to implement than "add the same suffix to all symbols." ILP64 is generally a situation where you should know what you are doing, or leave the linking and renaming to someone who does, so I don't care too much what the result looks like. It's almost always going to be handled by preprocessor macros anyway.

devinamatthews commented 4 years ago

I'm not seeing much consensus here. If someone wants to "create a standard" then we can try to implement it.