Reference-LAPACK / lapack

LAPACK development repository
Other
1.51k stars 441 forks source link

CBLAS/LAPACKE extension for 64bit integer #666

Closed mkrainiuk closed 4 months ago

mkrainiuk commented 2 years ago

Intro

CBLAS/LAPACKE wrappers support 32bit integer and 64bit integer with the same function names located in different libraries but this approach does not allow mixing both libraries in one environment because of the symbols conflict.

This FR is for discussing potential solution to avoid this problem and have stable work in any environment for the applications/libraries that has dependency on CBLAS/LAPACKE built with 64bit integer or any other projects with similar API.

Problem

Similar to Fortran API where integer with no size specified is used, CBLAS/LAPACKE API integer type can be selected by special build option during compilation. Example for CBLAS: https://github.com/Reference-LAPACK/lapack/blob/655e588cfac72048ad837416bb9d8340fdf3e79c/CBLAS/include/cblas.h#L20-L24

Since C/Fortran symbols do not reflect the data type in the name in general (like C++), linker cannot detect if the resolved symbols has correct Integer type. As the result symbols mismatch could cause unexpected behavior, like incorrect results, data corruption, and segmentation fault.

Example:

A plugin "X" uses CBLAS API with 64bit integer type because it works with number of elements that does not fit to 32bit. Another plugin "Y" loads "libcblas.so" CBLAS library built with 32bit integer as a dependency in the complex application environment, As first loaded "libcblas.so" library will be used for the first plugin "X" which will cause segmentation fault during application execution.

Related Work

Proposal

Example:

// Current API
float      cblas_snrm2(const CBLAS_INT N, const float *X, const CBLAS_INT incX);
lapack_int LAPACKE_dgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n, double* a, lapack_int lda, ... );

// 64bit fixed width integer API
float   cblas_snrm2_64(const int64_t N, const float *X, const int64_t incX); 
int64_t LAPACKE_dgeev_64( int matrix_layout, char jobvl, char jobvr, int64_t n, double* a, int64_t lda, ... );

int main() {
int64_t N, incX;
int32_t n, lda, ...;
...
cblas_snrm2_64(N, X, incX);
LAPACKE_dgeev(matrix_layout, jobvl, jobv, n, a, lda, ...);
...
}

Other Considerations

mkrainiuk commented 2 years ago

Summon people participated in the discussions mentioned in the Related Work section: Julia @staticfloat @ViralBShah FlexiBLAS @grisuthedragon BLIS @devinamatthews OpenBLAS @xianyi

ViralBShah commented 1 year ago

For Julia: @amontoison For OpenBLAS: @martin-frbg

Also related: #824

martin-frbg commented 1 year ago

Yes, I'm lurking... OpenBLAS currently does it by running a big objcopy after the initial build, which is far from ideal. And standardization of the expected ILP64 suffix across (at least) Julia/Go/(Num|Sci)Py is obviously desirable.

grisuthedragon commented 1 year ago

And standardization of the expected ILP64 suffix across (at least) Julia/Go/(Num|Sci)Py is obviously desirable.

Ok, I compiled a bit what @mkrainiuk proposed, some of us already discussed in https://github.com/mpimd-csc/flexiblas/issues/12, and what I discussed with a colleague.

Let us distinguish between suffixed and not suffixed. First, we have the ones without suffixes:

I would like to introduce the 32-bit and 64-bit builds with suffixes. This could be for example

Regarding the rules for suffixes for each symbol, my suggestion is this. We look at all symbols from the point of view of their native interface. That is, before the compiler does any name mangling. This means that the normal BLAS and LAPACK routines are treated in the Fortran context, and CBLAS and LAPACK from a C perspective. This keeps the approach invariant even under strange compilers and ABI definition, like using IBM/Sun compilers or macOS(like @ViralBShah reported there: https://github.com/mpimd-csc/flexiblas/issues/12#issuecomment-745270514 )

As for the suffixes themselves, I thought of the following:

martin-frbg commented 1 year ago

I do not like the additional complexity of having a libblas64.so with non-suffixed symbols, but I guess it is inevitable for backwards compatibility ? Fully agreed on the leading underscore, and on not treating the "work" interfaces as something special - after all, they are (or may be) using ILP64 internally themselves, not just doing some work for another function that happens to be ILP64. However, do we really need the _32 suffixes too, or could we just assume that any non-suffixed symbol is an implied _32 ? (I have to admit that enforcing suffixes would solve a potential problem with purely internal functions in OpenBLAS without having to mark them as hidden)

Cc @rgommers for NumPy and SciPy as the discussion seems to be gravitating towards here (we had something similar planned on a smaller scope) Maybe @kortschak or @vladimir-ch for Gonum if there is interest ?

kortschak commented 1 year ago

Gonum has a focus on floats, so I'm not sure how much interest there is in this from us.

martin-frbg commented 1 year ago

Thanks - discussion is entirely about large array addressing not fixed-point math.

grisuthedragon commented 1 year ago

I do not like the additional complexity of having a libblas64.so with non-suffixed symbols, but I guess it is inevitable for backwards compatibility ?

That allows to recompile projects with -fdefault-integer-8 or -i8 without changing any code (as long as it is purely Fortran)

Fully agreed on the leading underscore, and on not treating the "work" interfaces as something special - after all, they are (or may be) using ILP64 internally themselves, not just doing some work for another function that happens to be ILP64. However, do we really need the _32 suffixes too, or could we just assume that any non-suffixed symbol is an implied _32 ? (I have to admit that enforcing suffixes would solve a potential problem with purely internal functions in OpenBLAS without having to mark them as hidden)

The _32 suffixes is only for completeness, so I have no problem to neglect this variant.

grisuthedragon commented 1 year ago

@mkrainiuk I saw you added the first steps towards the _64 in 4c2236a. Doing some tests show up with the following problems:

In general, it looks as it fits the suggestions we made here.

mkrainiuk commented 1 year ago

Thanks for the great feedback!

Hi @grisuthedragon

  • the _64 is built although BUILD_INDEX64=OFF

BUILD_INDEX64 and BUILD_INDEX64_EXT_API are completely independent options:

  • the _64 is built together with the symbols without the suffix. Thus, linking 32 bit and 64 bit interfaces is impossible again.

It's possible when the standard API is built for LP64 and the extended _64 API is built for ILP64, but it requires some code modifications for using _64 in an application:

int32_t m1, n1, lda1, incx1, incy1;
int64_t m2, n2, lda2, incx2, incy2;
...
cblas_dgemm(CblasColMajor, CblasNoTrans, m1, n1, ... );     //LP64 API
cblas_dgemm_64(CblasColMajor, CblasNoTrans, m2, n2, ...);   //ILP64 API
  • I do not know if the way through the preprocessor and the command line is the best. Let's consider the following situation: We have a maximum command line length between 32kB and 2MB. In the case of 32kb and around 2000 BLAS/LAPACK routines, we have 16 characters for each definition in average. Having -DDGEQP3=DGEQP3_64 we get 19 characters (include a whitespace to separate them. This leads to strange situations in some platforms. I think a better way would be to provide a header-file containing the translation.

This is a good point, I considered new header file, but it requires manual update for any new function, so for CBLAS since the number of functions is relatively small I decided to generate the macro on the fly instead. I agree that this approach won't work for LAPACKE, so I'm looking for another solution that also can make this Fortran symbol renaming automatically during the build.

grisuthedragon commented 1 year ago

@mkrainiuk

BUILD_INDEX64 option changes the integer type from 32-bit to 64-bit for the standard API and the library name from cblas to cblas64, but in some cases having different library names can't guarantee that there won't be symbol conflicts when both libraries are loaded to the same env. So I guess the long term solution could be migrating to "_64" API for ILP64 and dropping BUILD_INDEX64 option completely so that the standard API will be always for LP64.

From my opinion, as mentioned before, we need three build variants. The standard build, without any special options, suffixes or similar one, that leads to the LP64 variant, as you mentioned. Then an ILP64 built without suffixes, (and 64 or _ilp64 added to the library name). This is required to rebuilt applications with -i8 or -fdefault-integer8 without code changes. And finally, the ilp64 build with suffixes, that, as you mentioned, should be the standard for ilp64 usage. As far as I know from Julia (@ViralBShah) they are using the suffixed ilp64 library internally and thus it makes not problem to load other code that is linked against a LP64 BLAS. From this point of view, having the non-suffixed and the suffixed symbols in one library would be bad.

but in some cases having different library names can't guarantee that there won't be symbol conflicts when both libraries are loaded to the same env.

That's something nobody can guarantee and it is up to the programmer.

This is a good point, I considered new header file, but it requires manual update for any new function, so for CBLAS since the number of functions is relatively small I decided to generate the macro on the fly instead. I agree that this approach won't work for LAPACKE, so I'm looking for another solution that also can make this Fortran symbol renaming automatically during the build.

Since the set of functions changes only slowly, one can provide such a header file and eventually a script, which generates a new header from all the sources. One advantage of such a header file would be that one can provide it to the user as well to allow an easy migration between LP64 and ILP64 mode.

langou commented 1 year ago

I agree with @martin-frbg that the set of functions in LAPACK and LAPACKE changes slowly. For LAPACKE, at this point, there is no script that I am aware of and we are writing the LAPACKE layer functions by hand. This is possible because we only add a few routines at each release. Adding a layer related to LP64 / ILP64 / etc variants by hand would not be too much of an overkill. There is an exponential growth here though, but that would work. I am not asking for more work by hand but we are already generating S, C, D, Z by hand, and while not ideal that work-ish. The point is that LAPACK is slowly growing. @martin-frbg is correct.

martin-frbg commented 1 year ago

Different Martin but of course I agree with him :)

mkrainiuk commented 1 year ago

From my opinion, as mentioned before, we need three build variants. The standard build, without any special options, suffixes or similar one, that leads to the LP64 variant, as you mentioned. Then an ILP64 built without suffixes, (and 64 or _ilp64 added to the library name). This is required to rebuilt applications with -i8 or -fdefault-integer8 without code changes. And finally, the ilp64 build with suffixes, that, as you mentioned, should be the standard for ilp64 usage. As far as I know from Julia (@ViralBShah) they are using the suffixed ilp64 library internally and thus it makes not problem to load other code that is linked against a LP64 BLAS. From this point of view, having the non-suffixed and the suffixed symbols in one library would be bad.

First two build variants are supported. Could you please share more details why two sets of symbols in one library would be bad? If ones load one set of symbols from the library (without suffixes) and do not load another set (with suffixes) or vise versa what kind of problems it could cause?

mkrainiuk commented 1 year ago

That's something nobody can guarantee and it is up to the programmer.

Right, but with the suffixes we can give a chance to programmers to ensure the correct ILP64 symbols are always used.

Since the set of functions changes only slowly, one can provide such a header file and eventually a script, which generates a new header from all the sources. One advantage of such a header file would be that one can provide it to the user as well to allow an easy migration between LP64 and ILP64 mode.

Agree, so the header file with manual updating could work too if I won't find a nice automatic solution.

grisuthedragon commented 1 year ago

The case where both symbols in one Library cause problems is easily constructed... On the one hand Julia, Python load it via dlopen and flags like RTLD_GLOBAL and RTLD_NOW can be specified. And on the other hand strange cross dependencies over third level projects Like qrupdate, arpack,... In combination with different linker options and orders this leads to hard-to-debug problems.

ViralBShah commented 1 year ago

We implement the _64 suffix symbols in our LAPACK in Julia in a grotesque way: https://github.com/JuliaPackaging/Yggdrasil/blob/feaab2720976d2db53b80d408a0fd19a1f5042d1/L/LAPACK/common.jl#L291

Also, Apple is using a different convention in Accelerate for LAPACK ILP64. We use libblastrampoline to dispatch to those routines on macOS: https://github.com/JuliaLinearAlgebra/libblastrampoline/pull/113

mkrainiuk commented 1 year ago

The case where both symbols in one Library cause problems is easily constructed... On the one hand Julia, Python load it via dlopen and flags like RTLD_GLOBAL and RTLD_NOW can be specified. And on the other hand strange cross dependencies over third level projects Like qrupdate, arpack,... In combination with different linker options and orders this leads to hard-to-debug problems.

I'd expect having explicit _64 in the symbol name could help in the described case, because regardless of the loaded library it always points to ILP64 implementation vs standard name that could be either LP64 or ILP64, depends on what library is picked up for the symbol resolution.

mkrainiuk commented 1 year ago

Also, Apple is using a different convention in Accelerate for LAPACK ILP64. We use libblastrampoline to dispatch to those routines on macOS: JuliaLinearAlgebra/libblastrampoline#113

Thank you for bringing it up. It's an interesting approach to add for the Relative Work.

rgommers commented 1 year ago

Hi all, thanks for the very useful discussion and progress on this topic.

The issue description is pretty clear about the two ways this is currently done (the Julia/OpenBLAS way and the MKL/cuBLAS way), and proposes to go with the MKL/cuBLAS way - which is implemented in the master branch of this repo since a few weeks. However I think it did leave out some relevant context on other projects, as well as on the work needed to adapt to the choice. So I'd like to delve into that a bit to make sure that we're indeed all on the same page and will actually be able to converge to what's decided here.

C/Fortran API naming vs binary symbol naming

The _64 MKL style proposal starts from the API name: it appends _64 for both the C and Fortran APIs, and then the binary symbol names become that plus whatever compiler mangling makes of that. E.g. for gfortran on Linux: dgemm + _64 + _ -> dgemm_64_.

The 64_ Julia/OpenBLAS style applies compiler mangling first, and then appends the suffix. E.g, for gfortran on Linux: dgemm + _ + 64_.

For the most important/common cases we get a single trailing underscore and hence end up with the same binary symbol names for BLAS. And different ones for CBLAS:

suffix choice base API name binary symbol name call from Fortran code call from C code
MKL _64 dgemm dgemm_64_ dgemm_64(...) dgemm_64_(...)
OpenBLAS 64_ dgemm dgemm_64_ dgemm_64(...) dgemm_64_(...)
MKL _64 cblas_dgemm cblas_dgemm_64 n/a cblas_dgemm_64(...)
OpenBLAS 64_ cblas_dgemm cblas_dgemm64_ n/a cblas_dgemm64_(...)

The story for LAPACK/LAPACKE will be the same; LAPACK will match, LAPACKE won't.

Current status

When building current master of this repo on Linux with gcc/gfortran, we get:

$ # build with: cmake -DBUILD_INDEX64=ON -DBUILD_SHARED_LIBS=ON -DCBLAS=ON
$ nm -gD libblas64.so | rg dgemm 
0000000000027df0 T dgemm_
0000000000098040 T dgemm_64_
$ nm -gD libcblas64.so | rg dgemm
00000000000109e0 T cblas_dgemm
0000000000022920 T cblas_dgemm_64

For NumPy/SciPy we build OpenBLAS with make INTERFACE64=1 SYMBOLSUFFIX=64_ (and distribute that shared library), which gives:

$ nm -gD libopenblas64_.so | grep dgemm    # partial output with relevant BLAS symbols:
dgemm_64_
cblas_dgemm64_

Julia does the same as NumPy/SciPy. I downloaded Julia 1.9.2 (the latest release) and it has a libopenblas64_.so bundled, it contains:

$ nm -gD lib/julia/libopenblas64_.so | rg dgemm
0000000000145970 T cblas_dgemm64_
0000000000143e70 T dgemm_64_

So as in the table higher up, the BLAS symbols match with reference BLAS with _64, while the CBLAS symbols don't.

If we'd instead use _64 as the symbol suffix and build OpenBLAS with $ make INTERFACE64=1 SYMBOLSUFFIX=_64, we'd get:

$ nm -gD libopenblas_64.so | rg dgemm
00000000000a3430 T cblas_dgemm_64
00000000000a0a40 T dgemm__64

Now the CBLAS symbol name matches, but the BLAS one doesn't (which is worse).

For completeness I also checked what R is doing; they don't have ILP64 support in the source code of their main code base as far as I can tell. They also don't distribute Linux binaries themselves, and Windows/macOS are standalone installers - so not much to worry about there.

Finally also note that for the OpenBLAS scheme:

History

Given that the issue description here only mentions Julia for the 64_ option, I think it's useful to extend that a bit:

Regarding other open source projects that considered the symbol suffix topic:

Impact & changes needed to adapt to _64

First let me emphasize that any decision here is much better than no decision. And that while both schemes work, the _64 MKL style one does seem a bit cleaner. That said, it seems like going that way will cause a significant amount of work, more so than staying with the more widely used OpenBLAS-style 64_. OpenBLAS, NumPy/SciPy, and Julia are used more widely and built from source in a larger number of places (SuiteSparse I don't really know), and also the distribution model of such binaries is more complicated. Updating MKL in comparison would be straightforward since being proprietary it's effectively a single set of binaries that are redistributed in some packaging systems, and it already has support for multiple symbols (e.g. it contains both dgemm_64 and dgemm_64_ already, with one being an alias of the other).

Changes that will be needed include:

OpenBLAS:

NumPy/SciPy:

Since it's only the CBLAS symbol names that will change for the situations that actually matter in practice, I'm hopeful that rolling out this change isn't going to be too disruptive. Any hiccups are likely going to be due to the limited support for shared libraries in Python wheels - we're going to get situations where we have both libopenblas64_.so and libopenblas_64.so loaded in the same process (e.g., new NumPy version switches to _64, user imports older SciPy version, both vendor OpenBLAS as a shared library).

That assumes of course that we only need to deal with Fortran compilers that append a single underscore. If users get issues with older or more esoteric Fortran compilers, that may need more work.

Julia:

I won't hazard a detailed guess, but given that Julia binaries also vendor libopenblas64_.so and libblastrampoline is similar to SciPy's layer, it's probably similar to what I wrote above for NumPy/SciPy.

One extra impact may be due to SuiteSparse, since Julia uses that while NumPy/SciPy doesn't.

It's also not uncommon for Julia users to mix Julia and Python I believe (perhaps less common than some years ago though?). I'm not sure if that may result in extra symbol clashes; right now Julia and NumPy use the exact same scheme.


I hope the above sounds correct to everyone. Given that the work needed to adapt to this wasn't detailed out before as far as I can tell, it'd be good to hear that this is okay with everyone and that they're fine with making those changes. @martin-frbg for OpenBLAS and @ViralBShah for Julia in particular I think, WDYT?

staticfloat commented 1 year ago

I will note the one big reason for why Julia went with the alternate mangling; it's so that FORTRAN code that wants to link to these symbols can do so easily. gfortran automatically appends an underscore to all symbol names (which is why most BLAS APIs use dgemm_ as the symbol name in the first place, and why the CBLAS names do not have the trailing underscore). In order to take an older FORTRAN code and link it against a new ILP64 library, it's relatively straightforward to tell the compiler to redefine dgemm to dgemm_64 (and then the compiler adds its ending underscore, resulting in the name dgemm_64_. If instead you have a name such as dgemm_64 exported from your BLAS library, it's more difficult to force FORTRAN libraries to link against it, and requires source code changes rather than just passing -fdefault-integer-8 -Ddgemm=dgemm_64 at the compiler command line.

For the Julia world in particular, because we have libblastrampoline that already has advanced name-remapping capabilities, whatever decision is chosen here will be fine; we are shifting more and more of our numerical ecosystem to using LBT as the BLAS/LAPACK translation layer anyway. But for other ecosystems, I do encourage that they adopt this naming convention, as it reduces the friction necessary for other users, despite its ugly appearance.

grisuthedragon commented 1 year ago
* FlexiBLAS seems to be the only one that went with `_64`, in 2020 ([flexiblas#12](https://github.com/mpimd-csc/flexiblas/issues/12))

I did not implement anything in FlexiBLAS yet, since I want to see the proper solution in the reference implementation first. But I prefer the MKL style, since I works independent from the compiler's name mangling scheme and thus gives a cleaner view on the whole thing. Even though, many projects rely on the Fortran API, having consistent names in C part is necessary as well.

As soon as we have a proper standard, FlexiBLAS will implement this, but with a small difference to the stuff implemented at the moment in the master branch: Each API variant will result in a separate library, as described above.

Although the SunPerf library is mostly mentioned as the first occurrence of the suffixed symbols, and up to my knowledge SuiteSparse is the only project, which supports it, we can safely ignore this. This library and its hardware can be seen as legacy stuff. Especially the SunPerf BLAS approach leads to strange function names like DGGES364 or DGEQP364, which are not desirable from my point of view.

Adjusting the symbol names in Julia and NumPy/SciPy should not be a problem since they resolve the symbols at runtime and thus the symbols name could be mangled on the fly to fit the library.

IMHO the Julia/OpenBLAS way is relies too much on the name mangling done by gfortran and was implemented, as @ViralBShah said, a bit in a grotesque way.

staticfloat commented 1 year ago

As soon as we have a proper standard, FlexiBLAS will implement this, but also provide a ILP64 library without suffixes, to support rebuilding applications with -fdefault-integer-8.

While that can be useful, I highly encourage library developers to not make this the default, as it tends to cause problems on operating systems that load libraries with RTLD_GLOBAL-like semantics by default (e.g. Linux). It means that if you're in a position where you may load two separate BLAS libraries at once (e.g. you load FlexiBLAS and MKL via import numpy or similar) you run the risk of symbol confusion that can result in segfaults. This is not a problem if you know a-priori what libraries your entire program will load, however if there is a chance that somewhere someone will dlopen() something, please ensure that all ILP64 symbols are namespaced in some way.

Adjusting the symbol names in Julia and NumPy/SciPy should not be a problem since they resolve the symbols at runtime and thus the symbols name could be mangled on the fly to fit the library.

We have spent a lot of time and energy coming up with a naming scheme that is consistent, easily transformable from existing source code, works with a variety of compilers/languages (C, FORTRAN, etc...) and protects against symbol confusion. As said before, we use LBT to translate from other naming conventions to this one, so at some level we can adapt to anything that is decided here, but I think it highly likely that all of the software that is being built in the Julia ecosystem that uses BLAS/LAPACK will continue to be built to the current naming interface, so as to be as useful as possible to other projects, whether they be written in Julia, C, or FORTRAN.

grisuthedragon commented 1 year ago

While that can be useful, I highly encourage library developers to not make this the default, as it tends to cause problems on operating systems that load libraries with RTLD_GLOBAL-like semantics by default (e.g. Linux). It means that if you're in a position where you may load two separate BLAS libraries at once (e.g. you load FlexiBLAS and MKL via import numpy or similar) you run the risk of symbol confusion that can result in segfaults. This is not a problem if you know a-priori what libraries your entire program will load, however if there is a chance that somewhere someone will dlopen() something, please ensure that all ILP64 symbols are namespaced in some way.

Sure, from a software development point of view that is a horrible thing. But I still have to deal with researchers and their code and there somebody says "Can we try this for larger examples" and thus the whole code gets compiled with the increased integer flag. For this reason the "dangerous" variant of the library is required. For all other cases, and proper software development, the suffixed API should be the way to go.

staticfloat commented 1 year ago

there somebody says "Can we try this for larger examples" and thus the whole code gets compiled with the increased integer flag.

I totally understand these kind of constraints. This is another reason why I suggest naming conventions that can be easily used within the constraints of compiler name mangling rules. In the FORTRAN example, we recompile ancient code all the time by simply adding a series of -D flags to redefine dgemm to dgemm_64, as in the example I gave above. In fact, many of our third party dependencies such as LAPACK are built with these compiler flags, defined for all BLAS and LAPACK symbols. This kind of simple renaming is not possible if we don’t follow the compiler name mangling rules, and will require source code changes in order to rebuild.

rgommers commented 1 year ago

Thanks for the replies and context @staticfloat and @grisuthedragon.

I will note the one big reason for why Julia went with the alternate mangling; it's so that FORTRAN code that wants to link to these symbols can do so easily.

I'll note that the symbol names that end up in the binary are the same in all common cases (compiler mangling appending a single _), so I don't think that matters for the two alternatives under consideration here - they are equivalent when calling from Fortran code, and should not require source code changes instead of the -fdefault-integer-8 -Ddgemm=dgemm_64 approach.

I did not implement anything in FlexiBLAS yet [....]

Thanks for the correction, good to know.

Adjusting the symbol names in Julia and NumPy/SciPy should not be a problem since they resolve the symbols at runtime and thus the symbols name could be mangled on the fly to fit the library.

I wish that that were true for NumPy/SciPy, but it isn't - it's all determined at build time. SciPy does have a layer which re-exports a C API with stable names, so for other Python packages there's no issue, they can use that. But for NumPy/SciPy it'll be quite a bit of work to adapt to this. Which I'm willing to do, but it's probably going to take a while before it's all done.

Sure, from a software development point of view that is a horrible thing. But I still have to deal with researchers and their code and there somebody says "Can we try this for larger examples" and thus the whole code gets compiled with the increased integer flag. For this reason the "dangerous" variant of the library is required

I think the key thing here is to distinguish between the "researcher wants to try this with limited effort" and the "how do we package BLAS and LAPACK for redistribution" use cases. For the former you may want the dangerous variant, and as a HPC cluster admin or some such role you may make it available to the users you support. But for the latter, you never want to deal with it. We should only ever see libblas_64.so in distros, not libblas64.so. So some docs which recommend what to do for packagers (Linux distros, Homebrew, etc.) would be useful. I was already planning to write those for OpenBLAS; I can contribute them in this repo too if that would be welcome.

grisuthedragon commented 1 year ago

I'll note that the symbol names that end up in the binary are the same in all common cases (compiler mangling appending a single _)

This is the wrong assumption. Regarding the IBM XLF (compilers used on POWER-based HPC systems), there is nothing added to the binaries' symbol names. Thus adding 64_ on the binary level will end in DGEMM64_ and not DGEMM_64 from the Fortran API's point of view. We should not focus on the behavior of gfortran while creating an approach for the symbol names.

ViralBShah commented 1 year ago

So long as the LAPACK build provides a way to mangle the names with whatever suffix one wants as part of the build process, different projects can take whatever approach works best. In absence of this support in the build, all of us have to resort to crude hacks.

martin-frbg commented 1 year ago

We haven't seen the LAPACK version of this PR yet, and looking at how the BLAS64 one handles the symbol (re)naming in the sources by resorting to CMAKE copy-and-regexreplace trickery in the build directory instead of preprocessing does not give me the highest hopes. OpenBLAS already finds out how the compiler likes to mangle symbol names, guess I will have to retain at least part of its current objcopy trickery to please everybody, even if I rewrite everything to support simultaneous provision of 32 and 64bit integer interfaces. (That simultaneous presence of blas/blas_64 symbolscould be a good thing, except I expect some distributors will then go ahead and hack it apart again to supply libblas and libblas64 for their alternatives system...)

Curious coincidence that this issue got numbered after the eigenvalue of the beast :)

mkrainiuk commented 9 months ago

Hi All,

The PR for LAPACKE is merged now, please share your feedback for the changes, if you have any. If there are no concerns for the current approach I will close this issue.

mkrainiuk commented 4 months ago

Closing as completed since the changes were merged and there is no ongoing discussion.