Closed sh1ng closed 4 years ago
-static-libgcc -static-libstdc++
I don't think XGBoost has been tested with this setting. What's your use case for this?
Support system with GCC < 5.
@sh1ng Could you please provide an easier script for reproducing?
Updated original description.
More generic question, how do you plan to ship whl on a system with GCC < 5 and an old version of GLIBCXX?
@sh1ng Previously, we used to use GCC 4.8 + CentOS 6 Docker image to build XGBoost wheels. We upgraded GCC to 5+ because 4.8 doesn't quite provide full support for C++11 standard. Today, I tried compiling XGBoost with my old laptop and here's what I got:
chohyu01@chohyu01-Lenovo-IdeaPad-Y500:~/Desktop/xgboost$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.6 LTS
Release: 14.04
Codename: trusty
chohyu01@chohyu01-Lenovo-IdeaPad-Y500:~/Desktop/xgboost$ make -j10
[lots of outputs later]
g++ -c -DDMLC_LOG_CUSTOMIZE=1 -std=c++11 -Wall -Wno-unknown-pragmas -Iinclude -Idmlc-core/include -Irabit/include -I/include -O3 -funroll-loops -msse2 -fPIC -fopenmp src/tree/updater_colmaker.cc -o build/tree/updater_colmaker.o
src/tree/tree_model.cc: In constructor ‘xgboost::GraphvizGenerator::GraphvizGenerator(const xgboost::FeatureMap&, const string&, bool)’:
src/tree/tree_model.cc:465:55: error: invalid initialization of non-const reference of type ‘std::stringstream& {aka std::basic_stringstream<char>&}’ from an rvalue of type ‘<brace-enclosed initializer list>’
TreeGenerator(fmap, with_stats), ss_{SuperT::ss_} {
^
make: *** [build/tree/tree_model.o] Error 1
I understand the need for supporting old systems with old GLIBC version. @thesuperzapper raised a similar request in https://github.com/dmlc/xgboost/pull/4538#issuecomment-500704601. There's also a similar issue with LIBGOMP version as well: https://github.com/dmlc/xgboost/pull/4306#issuecomment-495304573.
However, using GCC 4.8 has its own cost, since GCC 4.8 doesn't support C++11 fully and it's going to force developers to implement (potentially time-consuming) workarounds. This is why I'm not in favor of reverting to GCC 4.8. Do you have any good suggestion?
Some ideas: 1) Use MUSL to replace GLIBC. MUSL is designed to allow full static linking: https://www.musl-libc.org 2) It appears that we can actually statically link LIBGOMP, with right set of compilation flags: https://stackoverflow.com/questions/23869981/linking-openmp-statically-with-gcc?rq=1
Maybe we can start an experimental GitHub repository and get XGBoost compiled with 1) and 2). If we can remove dynamic linking against GLIBC and LIBGOMP, then we could keep GCC 5+ and still support old systems.
@CodingCat @chenqin @yinlou In your experience, are they still many Spark clusters running old Linux?
It'd be great if someone of you could help with 2).
Centos/RHEL/Oracle Linux 7 are still valid platforms and worth supporting. I managed to compile the 0.90 release on OL6 without too much trouble using a gcc 4.8 compiler. The binary runs on OL6, 7 and a newer Ubuntu.
It's possible to get updated compilers on RHEL7 and OL7 using the devtoolset (https://developers.redhat.com/products/developertoolset/hello-world#fndtn-windows, https://docs.oracle.com/cd/E37670_01/E59096/html/section_zlg_m3g_dq.html). Binaries compiled with these tools seem to work fine on stock releases of OL7 (as that's our deployment environment and I don't mess with the libraries on it).
@Craigacp GCC 4.8 won't actually work, because you can compile XGBoost 0.90 with it but not run it. The reason is because <regex>
is broken in GCC 4.8: http://www.michaelbrich.com/no-working-around-broken-c11-regex-in-gcc-4-8/. In addition, I think XGBoost developers would like to use the full range of features available in C++11; using GCC 4.8 would force them to adopt inconvenient workarounds and hamper developer productivity. For example, some code that was recently added after 0.90 relies on full C++11 support and does not compile with GCC 4.8: https://github.com/dmlc/xgboost/issues/4724#issuecomment-518061534
A better way forward would be to use GCC 5+ but generate static binaries that can run on older platforms.
Binaries compiled using the dev tools version of gcc from RHEL7 & OL7 run fine on versions of those operating systems without the dev tools installed. So you can compile using GCC 5 or later on that platform, but it still binds to the base glibc.
I've been running the builds of xgboost 0.90 that I compiled on OL6 and everything seems to work fine. What codepath is regex in? I might not be hitting it with my single node Java usecases.
@Craigacp Currently, the CLI config parser uses <regex>
.
So you can compile using GCC 5 or later on that platform, but it still binds to the base glibc.
Indeed, you are right. I just checked all dependencies and libxgboost.so
by running
hcho3@ubuntu# objdump -T libxgboost.so
and got the following symbols:
CXXABI_1.3
, CXXABI_1.3.2
, CXXABI_1.3.3
GCC_3.0
GLIBCXX_3.4
, GLIBCXX_3.4.9
, GLIBCXX_3.4.10
, GLIBCXX_3.4.11
GLIBC_2.2.5
, GLIBC_2.3
, GLIBC_2.3.2
, GLIBC_2.3.3
, GLIBC_2.3.4
, GLIBC_2.4
GOMP_1.0
, GOMP_4.0
OMP_1.0
(libxgboost.so
was created by compiling XGBoost inside CentOS 6 Docker container with Devdevtoolset-4. See the container at https://github.com/dmlc/xgboost/blob/master/tests/ci_build/Dockerfile.gpu_build)
GLIBC_2.4
should be compatible with CentOS 6.x, according to https://pkgs.org/download/libc.so.6(GLIBC_2.4)
So I suppose we only need to remove hard dependency on LIBGOMP then. See #4489
After some googling, here's information I found:
Symbol | Package | Availability |
---|---|---|
CXXABI_1.3.3 |
libstdc++-4.4.7-23.el6 | CentOS 6.10 |
GCC_3.0 |
libgcc-4.4.7-23.el6 | CentOS 6.10 |
GLIBCXX_3.4.11 |
libstdc++-4.4.7-23.el6 | CentOS 6.10 |
GLIBC_2.4 |
glibc-2.12-1.212.el6 | CentOS 6.10 |
GOMP_4.0 |
libgomp-4.4.7-23.el6 | CentOS 6.10 |
OMP_1.0 |
libgomp-4.4.7-23.el6 | CentOS 6.10 |
Hmm, I'm still trying to figure out why #4489 happened. Maybe it has to do with Travis CI not having latest packages for its Trusty target? According to this table at least, I should be able to compile XGBoost with GCC 5.x and then run the compiled binary on CentOS 6.10.
I probably should add Ubuntu Trusty and CentOS 6 targets to CI testing harness.
@sh1ng Are you targeting platforms that are older than CentOS 6?
We use CentOS 7, including ppc64le
platform.
devtoolset
works on x86_64
for CUDA-10 indeed. If I'm not mistaken it ships libstdc++ changes as a separate statically linked library. It's not fully fulfill our needs. We need to support ppc64le
platform and CUDA-9 that may not support GCC-6(there's no devtoolset-5).
That's why I've installed gcc(5.3) from sources.
@sh1ng So it appears that the current binary distribution (xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl
) is already functional on CentOS 6 or newer, as long as the processor architecture is x86_64
. I was just able to install and run XGBoost 0.90 inside Ubuntu 14.04 (Trusty) and CentOS 6 with
pip3 install xgboost==0.90
So I'd say that we are pretty solid when it comes to supporting x86-64 systems with old OSes.
We need to support ppc64le platform and CUDA-9 that may not support GCC-6(there's no devtoolset-5).
I'm afraid I won't be of much help when it comes to ppc64le platform. Our CI (https://xgboost-ci.net) covers x86-64 only and to my knowledge, none of the developers here use ppc64le.
@sh1ng Actually, it looks like NVIDIA provides PPC64LE Docker image: https://hub.docker.com/r/nvidia/cuda-ppc64le/. This is good news because I can now use QEMU to emulate ppc64le on my machine and build XGBoost.
(Instructions adopted from https://tthtlc.wordpress.com/2018/11/27/how-to-run-ppc-binaries-in-docker/)
cd ${HOME}
wget https://github.com/multiarch/qemu-user-static/releases/download/v2.7.0/qemu-ppc64le-static.tar.gz
tar xvf qemu-ppc64le-static.tar.gz
docker pull nvidia/cuda-ppc64le:8.0-devel-centos7
docker run --rm --privileged multiarch/qemu-user-static:register --reset
docker run --rm -v ${HOME}/qemu-ppc64le-static:/usr/bin/qemu-ppc64le-static \
-it nvidia/cuda-ppc64le:8.0-devel-centos7 /bin/bash
# Get an interactive Bash session
# Now git clone XGBoost and build it
Compiler versions:
[root@a6c125a16afb build]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[root@a6c125a16afb build]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:28:28_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Drat, it's GCC 4.8. I'll need to figure something out here.
There's no devtoolset
package to ppcle64.
CUDA-9 isn't compatible with gcc-6(there's no devtoolset-5 pkg). I prefer to support current major and the previous version.
I'd be awesome to compile xgboost statically with -static-libstdc++
and make it working
Unless I have been given priority and resource (power machine) for this I don't think I can be of help. And I can think of a few places where big endian can cause problems. ;-(
I'm trying to create
libxgboost.so
portable on a system with gcc version less than 5.0. Everything is building in dockernvidia/cuda:10.0-cudnn7-devel-centos7
gcc-5.3 get build inside a container and used to build xgboost.Dynamic linking works good, but a new version
libstdc++.so
has to be presented on a machine where xgboost is executing(for testing I'm also use the same docker image, but without gcc-5).When xgboost get build with static linking
-static-libgcc -static-libstdc++
or-static-libgcc -static-libstdc++ libgcc.a -libstdc++.a
it returns pretty weird error regarding parsing input parameter.Or segfault
All test cases from https://github.com/h2oai/h2o4gpu/tree/master/tests/python/open_data/gbm are failing, so it's parameter's parser issue.
All source code is available in https://github.com/h2oai/h2o4gpu/pull/790