Open Zha0q1 opened 4 years ago
The problem here is that we can never know what BLAS libraries are linked into other Python packages that our users happen to use. NumPy may be a popular one, but other packages may also link different BLAS implementations. The same issue applies to the OpenMP dependency.
Did you statically link BLAS implementation? You can refer to https://github.com/apache/incubator-mxnet/pull/17751. For dynamic linking, the loader will load the implementation referenced by the first package that is loaded and not load other BLAS implementations if the BLAS symbols are already available. Maybe importing mxnet first triggers importing numpy, causing the numpy BLAS implementation to be loaded?
is this from pip package or building from source? for pip, we statically link blas/lapacke and stripped all symbols that are not exposed. this means that the blas/lapacke functions are not in the symbol table.
is this from pip package or building from source? for pip, we statically link blas/lapacke and stripped all symbols that are not exposed. this means that the blas/lapacke functions are not in the symbol table.
This if for building from source. Are we planing to support large tensors by default in the future? Currently it's controlled by a flag and I think the distributed whl's don't have it turned on (I just joined the effort to support large tensors for v1.x and 2.0.)
The problem here is that we can never know what BLAS libraries are linked into other Python packages that our users happen to use. NumPy may be a popular one, but other packages may also link different BLAS implementations. The same issue applies to the OpenMP dependency.
Did you statically link BLAS implementation? You can refer to #17751. For dynamic linking, the loader will load the implementation referenced by the first package that is loaded and not load other BLAS implementations if the BLAS symbols are already available. Maybe importing mxnet first triggers importing numpy, causing the numpy BLAS implementation to be loaded?
Right it was a dynamic link. Is there a way to link it statically when building from source?
@Zha0q1 When building pypi wheel. openBLAS is linked statically and is inside libmxnet.so
. So for dynamic inking we can layout guidelines for users saying that if they want to build from source they need to reinstall numpy with openblas built from source.
We can change the build from source instructions for mxnet.2.0 to always build openBLAS from source and statically link it. @leezu @szha what do you think ?
Alright I tried the static build scripts under /tools This is what I got
ubuntu@ip-172-31-12-248:~/incubator-mxnet$ ldd lib/libmxnet.so
linux-vdso.so.1 (0x00007fff525c4000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb1116b7000)
libopenblas.so.0 => /home/ubuntu/incubator-mxnet/lib/libopenblas.so.0 (0x00007fb10efc4000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb10edbc000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb10eb9d000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fb10e96e000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb10e5e5000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb10e247000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb10e02f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb10dc3e000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb1192fc000)
libgfortran.so.4 => /home/ubuntu/incubator-mxnet/lib/libgfortran.so.4 (0x00007fb10d85f000)
libquadmath.so.0 => /home/ubuntu/incubator-mxnet/lib/libquadmath.so.0 (0x00007fb10d61f000)
It looks like libmxnet is still dependent on libopenblas. Am I doing this wrong or should I config the script in some ways? @leezu
As discussed offline, we should avoid requiring all users to build openblas from source, especially as distributions start to provide libopenblas64_.so
versions with OpenBLAS ILP64 build and 64_
suffix in symbol name. Fedora in https://bugzilla.redhat.com/show_bug.cgi?id=1295965. Debian testing / Ubuntu 20.04 provide libopenblas64.so
without suffix per https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878121 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910924 I initiated a discussion on debian-science mailinglist about providing the same package in Debian. It's also tracked in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=967951
To avoid symbol version clashes as reported in these issues, we must require the suffixed symbol names or static link. We can work with cmake upstream on a standard approach to detecting 64bit suffixed openblas as part of https://cmake.org/cmake/help/latest/module/FindBLAS.html We can backport the upstream recommended approach to our https://github.com/apache/incubator-mxnet/tree/master/cmake/upstream
To support older distributions that do not ship with libopenblas64_.so
, we should include instructions for building libopenblas64_.so
from source in our guide.
Static linking is a great solution for the pip distribution build, but I don't think we should settle on a solution that requires static linking, because there are already better solution established as best practice (ie. symbol suffix) and this will restrict the versatility of MXNet source build and make it harder to package MXNet as part of Debian or other distributions.
As discussed offline, we should avoid requiring all users to build openblas from source, especially as distributions start to provide
libopenblas64_.so
versions with OpenBLAS ILP64 build and64_
suffix in symbol name. Fedora in https://bugzilla.redhat.com/show_bug.cgi?id=1295965. Debian testing / Ubuntu 20.04 providelibopenblas64.so
without suffix per https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878121 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910924 I initiated a discussion on debian-science mailinglist about providing the same package in Debian. It's also tracked in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=967951To avoid symbol version clashes as reported in these issues, we must require the suffixed symbol names or static link. We can work with cmake upstream on a standard approach to detecting 64bit suffixed openblas as part of https://cmake.org/cmake/help/latest/module/FindBLAS.html We can backport the upstream recommended approach to our https://github.com/apache/incubator-mxnet/tree/master/cmake/upstream
To support older distributions that do not ship with
libopenblas64_.so
, we should include instructions for buildinglibopenblas64_.so
from source in our guide.Static linking is a great solution for the pip distribution build, but I don't think we should settle on a solution that requires static linking, because there are already better solution established as best practice (ie. symbol suffix) and this will restrict the versatility of MXNet source build and make it harder to package MXNet as part of Debian or other distributions.
@leezu Thanks for initiating the discussion on debian science. My solution was to have build instructions for everyone to build openblas from source(static linking would avoid issues for cloud users using DLAMI on EC2 that causes issues with openBLAS installation). That way build instructions are identical regardless of the distro selection. Most customers use pip install so for them there isn't a difference in experience.
Since distros will start to provide _64 versions of libraries we don't need to worry about those then.
My solution was to have build instructions for everyone to build openblas from source
It's great to have those instructions where required. But we should also look at the trend and build a solution that aligns with the work of others.
Since distros will start to provide _64 versions of libraries we don't need to worry about those then.
We may need to add support for those, it's not automatic. For example, https://cmake.org/cmake/help/latest/module/FindBLAS.html does not currently distinguish the 64 and 32bit versions as the maintainers may not be aware of the use-case.
This doesn't all have to happen at once. We just need to ensure that our approach remains compatible with what we eventually would like to achieve.
Makes sense
is this from pip package or building from source? for pip, we statically link blas/lapacke and stripped all symbols that are not exposed. this means that the blas/lapacke functions are not in the symbol table.
@Zha0q1 you may want to check if these steps are still there in new build scripts and add as necessary
If you want to make this work via static linking, you may need to use
-Bsymbolic
When creating a shared library, bind references to global symbols to the
definition within the shared library, if any. Normally, it is possible
for a program linked against a shared library to override the definition
within the shared library.
This option is only meaningful on ELF platforms which support shared libraries.
See also https://stackoverflow.com/questions/7216973/is-there-a-downside-to-using-bsymbolic-functions
But instead we can just workaround the issue by adopting the 64_ suffix convention.
Discussed offline and we need to re-enable the symbol whitelisting in cmake builds. They previously existed in make-based builds: https://github.com/apache/incubator-mxnet/blob/v1.6.x/make/config/libmxnet.ver https://github.com/apache/incubator-mxnet/blob/v1.6.x/make/config/libmxnet.sym
Thanks! @szha @leezu I just looked into adding a suffix to cblas/lapack calls within our code base. There are ~50 places in a handful of different files that all need to be changed. This makes me think if this change is too heavy. Also 1. openblas is used in TVM, so they will need to make the same change to be consistent with us. 2. like @leezu mentioned finding 64 openblas is not well supported and distros also don't have a unified solution to it. For our purpose of supporting large tensors in 2.0, if we could link 64 openblas statically for our soon-to-come release, I would still think that’s the best solution.
BTW int 32 blas will work for tensors with size > INT_MAX (2**31 - 1 ), it's when a dim is > INT_MAX we must use int 64 blas, because in the function declarations they use int 32 for stride
There are ~50 places in a handful of different files that all need to be changed. This makes me think if this change is too heavy
Why not follow the approach in numpy and define a macro at a central place? https://github.com/numpy/numpy/pull/15069/files#diff-4538717a0246e7d9363e76a2e3fc835e
finding 64 openblas is not well supported
You can edit https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindOpenBLAS.cmake. You will need to edit this file in any case, even if you chose not to rely on symbol suffixes. That's because ILP64 openblas would typically be named libopenblas64
(or libopenblas64_
with suffixes). The mid-term approach is work with upstream so that we can eventually delete FindOpenBLAS.cmake
and just rely on upstream cmake feature.
For our purpose of supporting large tensors in 2.0, if we could link 64 openblas statically for our soon-to-come release, I would still think that’s the best solution.
There are two different cases: One for the staticbuild for pip, where static linkage will be preferred. For the general CMakeLists.txt, why should we restrict our users to static linkage?
Also 1. openblas is used in TVM, so they will need to make the same change to be consistent with us
How do you ensure this consistency without symbol suffix? Does TVM support ILP64? If TVM expects standard 32bit blas but you link ILP64 blas with the same symbol names, wouldn't there be issues?
BTW int 32 blas will work for tensors with size > INT_MAX (2**31 - 1 ), it's when a dim is > INT_MAX we must use int 64 blas, because in the function declarations they use int 32 for stride
If dim is > INT_MAX
is supported by MXNet, our BLAS operators need to either return the correct result or raise an error. @access2rohit told me that his PR making large tensor the default would just silently return wrong result in this case.
There are ~50 places in a handful of different files that all need to be changed. This makes me think if this change is too heavy
Why not follow the approach in numpy and define a macro at a central place? https://github.com/numpy/numpy/pull/15069/files#diff-4538717a0246e7d9363e76a2e3fc835e
finding 64 openblas is not well supported
You can edit https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindOpenBLAS.cmake. You will need to edit this file in any case, even if you chose not to rely on symbol suffixes. That's because ILP64 openblas would typically be named
libopenblas64
(orlibopenblas64_
with suffixes). The mid-term approach is work with upstream so that we can eventually deleteFindOpenBLAS.cmake
and just rely on upstream cmake feature.For our purpose of supporting large tensors in 2.0, if we could link 64 openblas statically for our soon-to-come release, I would still think that’s the best solution.
There are two different cases: One for the staticbuild for pip, where static linkage will be preferred. For the general CMakeLists.txt, why should we restrict our users to static linkage?
Also 1. openblas is used in TVM, so they will need to make the same change to be consistent with us
How do you ensure this consistency without symbol suffix? Does TVM support ILP64? If TVM expects standard 32bit blas but you link ILP64 blas with the same symbol names, wouldn't there be issues?
BTW int 32 blas will work for tensors with size > INT_MAX (2**31 - 1 ), it's when a dim is > INT_MAX we must use int 64 blas, because in the function declarations they use int 32 for stride
If
dim is > INT_MAX
is supported by MXNet, our BLAS operators need to either return the correct result or raise an error. @access2rohit told me that his PR making large tensor the default would just silently return wrong result in this case.
I created a simple PR. I am trying to learn about the scope of this change as I go
@sandeep-krishnamurthy would you take a look?
For reference, here are the things I tried:
One more thing is that currently the static build script defaults to dynamically linking openblas. We would need to make it a static link; I can share my changes (build openblas 32 => 64; dynamic link => static link) once we have a final consensus on how to support openblas 64 in the future.
Also, opencv would fail to link with this error:
../staticdeps/lib/libopenblas.a(sormbr.o): In function `sormbr_':
sormbr.f:(.text+0x3a6): undefined reference to `_gfortran_concat_string'
sormbr.f:(.text+0x562): undefined reference to `_gfortran_concat_string'
sormbr.f:(.text+0x5d1): undefined reference to `_gfortran_concat_string'
sormbr.f:(.text+0x621): undefined reference to `_gfortran_concat_string'
../staticdeps/lib/libopenblas.a(sormlq.o): In function `sormlq_':
sormlq.f:(.text+0x30b): undefined reference to `_gfortran_concat_string'
../staticdeps/lib/libopenblas.a(sormlq.o):sormlq.f:(.text+0x842): more undefined references to `_gfortran_concat_string' follow
../staticdeps/lib/libopenblas.a(slartg.o): In function `slartg_':
slartg.f:(.text+0xb1): undefined reference to `_gfortran_pow_r4_i8'
../staticdeps/lib/libopenblas.a(dlartg.o): In function `dlartg_':
dlartg.f:(.text+0xb3): undefined reference to `_gfortran_pow_r8_i8'
../staticdeps/lib/libopenblas.a(slaed6.o): In function `slaed6_':
slaed6.f:(.text+0x109): undefined reference to `_gfortran_pow_r4_i8'
../staticdeps/lib/libopenblas.a(dlaed6.o): In function `dlaed6_':
dlaed6.f:(.text+0x10e): undefined reference to `_gfortran_pow_r8_i8'
collect2: error: ld returned 1 exit status
For my poc build I had to turn off opencv for the mxnet build.
One more thing is that currently the static build script defaults to dynamically linking openblas.
There's a step that disable/remove shared object for dynamic linking.
One more thing is that currently the static build script defaults to dynamically linking openblas.
There's a step that disable/remove shared object for dynamic linking.
Yes, I did that on my machine and was able to build the wheel. I think we should add that back too
Both NumPy and MXNet are dependent on BLAS. When they are linked to different BLAS libraries there will be a name clashing issue. Effectively, only functions from NumPy's BLAS will be used by both NumPy and MXNet.
According to https://stackoverflow.com/questions/47891872/how-to-use-non-mkl-numpy-under-anaconda, anaconda will by default ship MKL-dependent NumPy. This is also the case on DLAMI 30:
I first ran into this issue while working on adding large tensor support to linalg operators, where I used a manually built int 64 version of Open BLAS. I used this simple test script:
On my machine (DLAMI 30 Ubuntu 18) Open BLAS is built with
DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 USE_OPENMP=1 INTERFACE64=1 BINARY=64 NO_SHARED=0 NO_LAPACK=0
and MXNet is built withUSE_BLAS="open" USE_INT64_TENSOR_SIZE=1
. Numpy is pre-installed with MKL optimization.Ideally,
linalg.syrk
would invoke Open BLAScblas_ssyrk
(my build, 64 bit int), but in reality because of the name clashing, MKLcblas_ssyrk
(32 bit int) is called instead. This will lead to:Using GDB we can see we are indeed calling into MKL
cblas_ssyrk
:Reinstalling NumPy and linking it to my Open BLAS build resolved the issue for me.
So the problem with this name clashing issue is that regardless of what BLAS we build MXNet with, we are stuck with the BLAS that NumPy is configured to use. While in most cases, such as supporting large tensor i.e. 64-bit indexing, it's fine to configure them to use the same BLAS lib (int 64 Open BLAS in my case), I wonder if there is special use case where we actually want different BLAS for NumPy and MXNet?
My guess would be "no", but still we should be aware of this issue as well as the extra step to reconfig NumPy and MXNet to the correct BLAS, and we probably need to note so in our build tutorial
This same issue is also noted on NumPy's build-from-source page: https://numpy.org/devdocs/user/building.html. Open BLAS support building with function prefixes and suffixes and NumPy can recognize suffixes like "64_" when built with 64 bit int support. We could do something like this potentially, adding a suffix/prefix to BLAS functions and use those names in MXNet, but again it's much easier to link NumPy and MXNet to the same BLAS