PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.12k stars 5.55k forks source link

Paddle do not fit PEP 513 `manylinux1` standard #4050

Closed reyoung closed 6 years ago

reyoung commented 7 years ago

Currently, we rename Paddle's wheel package into manylinux1. It is not good because our binary does not fit manylinux1 standard.

In manylinux1 standard, the dependencies are:

GLIBC <= 2.5
CXXABI <= 3.4.8
GLIBCXX <= 3.4.9
GCC <= 4.2.0

And we should build our wheel package in CentOS 5.

But we are using C++ 11, which depends on higher GLIBCXX. The current depends is GLIBCXX==3.4.21.

The pypa give a docker image to build manylinux1 standard wheel package in here. We should use that docker image to build our wheel package.

typhoonzero commented 6 years ago

Indeed, supporting manylinux1 is reaaaally needed, but I doubt that we should make a lot of changes to our build system. I'll give it a try.

typhoonzero commented 6 years ago

Found some work which port manylinux1 to centos6.

Using centos 5 may introduce too many problems, including CUDA installation or third-party libraries. Will try this on CentOS6.

By the way, auditwheel is a good tool to test whether the whl package is sufficient for the dependencies.

Trying this Dockerfile:

FROM quay.io/numenta/manylinux1_x86_64_centos6:0.1.2

RUN NVIDIA_GPGKEY_SUM=d1be581509378368edeec8c1eb2958702feedf3bc3d17011adbf24efacce4ab5 && \
    curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/7fa2af80.pub | sed '/^Version/d' > /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA && \
    echo "$NVIDIA_GPGKEY_SUM  /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA" | sha256sum -c -

COPY cuda.repo /etc/yum.repos.d/cuda.repo

ENV CUDA_VERSION 7.5.18

ENV CUDA_PKG_VERSION 7-5-7.5-18
RUN yum install -y \
        cuda-nvrtc-$CUDA_PKG_VERSION \
        cuda-cusolver-$CUDA_PKG_VERSION \
        cuda-cublas-$CUDA_PKG_VERSION \
        cuda-cufft-$CUDA_PKG_VERSION \
        cuda-curand-$CUDA_PKG_VERSION \
        cuda-cusparse-$CUDA_PKG_VERSION \
        cuda-npp-$CUDA_PKG_VERSION \
        cuda-cudart-$CUDA_PKG_VERSION && \
    ln -s cuda-7.5 /usr/local/cuda && \
    rm -rf /var/cache/yum/*

RUN echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf.d/cuda.conf && \
    ldconfig

# nvidia-docker 1.0
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=7.5"

# for devel
RUN yum install -y \
        cuda-core-$CUDA_PKG_VERSION \
        cuda-misc-headers-$CUDA_PKG_VERSION \
        cuda-command-line-tools-$CUDA_PKG_VERSION \
        cuda-license-$CUDA_PKG_VERSION \
        cuda-nvrtc-dev-$CUDA_PKG_VERSION \
        cuda-cusolver-dev-$CUDA_PKG_VERSION \
        cuda-cublas-dev-$CUDA_PKG_VERSION \
        cuda-cufft-dev-$CUDA_PKG_VERSION \
        cuda-curand-dev-$CUDA_PKG_VERSION \
        cuda-cusparse-dev-$CUDA_PKG_VERSION \
        cuda-npp-dev-$CUDA_PKG_VERSION \
        cuda-cudart-dev-$CUDA_PKG_VERSION \
        cuda-driver-dev-$CUDA_PKG_VERSION \
        gcc-c++ \
        yum-utils && \
    rm -rf /var/cache/yum/*

RUN mkdir /tmp/gpu-deployment-kit && cd /tmp/gpu-deployment-kit && \
    rpm2cpio $(repoquery --location  gpu-deployment-kit) | cpio -id && \
    mv usr/include/nvidia/gdk/* /usr/local/cuda/include && \
    mv usr/src/gdk/nvml/lib/* /usr/local/cuda/lib64/stubs && \
    rm -rf /tmp/gpu-deployment-kit* && \
    rm -rf /var/cache/yum/*

ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs:${LIBRARY_PATH}

# for paddle
# no using /opt/devtools2
ENV PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN wget -q https://cmake.org/files/v3.5/cmake-3.5.2.tar.gz && tar xzf cmake-3.5.2.tar.gz && \
    cd cmake-3.5.2 && ./bootstrap && \
    make -j4 && make install && cd .. && rm cmake-3.5.2.tar.gz

RUN wget --no-check-certificate -qO- https://storage.googleapis.com/golang/go1.8.1.linux-amd64.tar.gz | \
    tar -xz -C /usr/local && \
    mkdir /root/gopath && \
    mkdir /root/gopath/bin && \
    mkdir /root/gopath/src

ENV GOROOT=/usr/local/go GOPATH=/root/gopath
# no using /opt/devtools2
ENV PATH=${GOROOT}/bin:${GOPATH}/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ENV LD_LIBRARY_PATH=/opt/_internal/cpython-2.7.11-ucs4/lib:${LD_LIBRARY_PATH}

# protobuf 3.1.0
RUN cd /opt && wget -q --no-check-certificate https://github.com/google/protobuf/releases/download/v3.1.0/protobuf-cpp-3.1.0.tar.gz && \
    tar xzf protobuf-cpp-3.1.0.tar.gz && \
    cd protobuf-3.1.0 && ./configure && make -j4 && make install && cd .. && rm -f protobuf-cpp-3.1.0.tar.gz

RUN /opt/python/cp27-cp27mu/bin/pip install protobuf==3.1.0

RUN yum install -y sqlite-devel zlib-devel openssl-devel boost boost-devel pcre-devel vim

RUN /opt/python/cp27-cp27mu/bin/pip install numpy && go get github.com/Masterminds/glide

RUN wget -O /opt/swig-2.0.12.tar.gz https://sourceforge.net/projects/swig/files/swig/swig-2.0.12/swig-2.0.12.tar.gz/download && \
    cd /opt && tar xzf swig-2.0.12.tar.gz && cd /opt/swig-2.0.12 && ./configure && make && make install && cd /opt && rm swig-2.0.12.tar.gz
typhoonzero commented 6 years ago

Updated using quay.io/numenta/manylinux1_x86_64_centos6:0.1.2, this images seems have only cp27-cp27mu python included. Adding cp27-cp27m support is also needed, build image from https://github.com/numenta/manylinux seems not passing.

typhoonzero commented 6 years ago

Update using above docker generated whl, auditwheel says:

LD_LIBRARY_PATH=/opt/_internal/cpython-3.5.1/lib:$LD_LIBRARY_PATH auditwheel show python/dist/paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl
Traceback (most recent call last):
  File "/usr/local/bin/auditwheel", line 11, in <module>
    sys.exit(main())
  File "/opt/_internal/cpython-3.5.1/lib/python3.5/site-packages/auditwheel/main.py", line 49, in main
    rval = args.func(args, p)
  File "/opt/_internal/cpython-3.5.1/lib/python3.5/site-packages/auditwheel/main_show.py", line 28, in execute
    winfo = analyze_wheel_abi(args.WHEEL_FILE)
  File "/opt/_internal/cpython-3.5.1/lib/python3.5/site-packages/auditwheel/wheel_abi.py", line 73, in analyze_wheel_abi
    get_wheel_elfdata(wheel_fn)
  File "/opt/_internal/cpython-3.5.1/lib/python3.5/site-packages/auditwheel/wheel_abi.py", line 42, in get_wheel_elfdata
    so_path_split[-1])
RuntimeError: Invalid binary wheel, found shared library "core.so" in purelib folder.
The wheel has to be platlib compliant in order to be repaired by auditwheel.
typhoonzero commented 6 years ago

After https://github.com/PaddlePaddle/Paddle/pull/5396 merged, build under above image will generate a whl package results:

auditwheel show paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl

paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl is consistent with
the following platform tag: "linux_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libm.so.6 with versions {'GLIBC_2.2.5'},
libstdc++.so.6 with versions {'GLIBCXX_3.4.11', 'GLIBCXX_3.4.10',
'GLIBCXX_3.4.9', 'CXXABI_1.3.3', 'CXXABI_1.3', 'GLIBCXX_3.4.13',
'GLIBCXX_3.4'}, libdl.so.2 with versions {'GLIBC_2.2.5'},
libgcc_s.so.1 with versions {'GCC_3.3', 'GCC_3.0'}, libc.so.6 with
versions {'GLIBC_2.3.4', 'GLIBC_2.2.5', 'GLIBC_2.3.2', 'GLIBC_2.7',
'GLIBC_2.6'}, libpthread.so.0 with versions {'GLIBC_2.2.5',
'GLIBC_2.3.2'}

This constrains the platform tag to "linux_x86_64". In order to
achieve a more compatible tag, you would to recompile a new wheel from
source on a system with earlier versions of these libraries, such as
CentOS 5.

Install this whl on a pure centos 6 works fine. This tool chain can support most of our cases. Will refine the build images/scripts and put it on CI.

typhoonzero commented 6 years ago

Update: I added a new repo https://github.com/PaddlePaddle/buildtools containing scripts to build development docker images of different version.

wangkuiyi commented 6 years ago

@typhoonzero How are you going to keep the relationship between versions of Paddle and buildtools? We are facing such a challenge to keep versions of models/book up to date with Paddle. The current solution is that we need a git submodule link in the models/book repo pointing to a certain version (recently released) of PaddlePaddle. We are not yet sure if this solution is perfect.

typhoonzero commented 6 years ago

@wangkuiyi No. buildtools contains only Dockerfiles that can build manylinux1 sufficient building environment, and it should be static since manylinux1 defined dependencies are static. buildtools repo is not going to update until a new PEP definition came out.

wangkuiyi commented 6 years ago

Sounds reasonable. According to my experience adding the default Dockerfile, it is static. And it seems that Dockerfile.android is also static. Let's go for your proposal for a while. If it runs all good, we can switch completely to it.

Yancey1989 commented 6 years ago

I reopen this issue, because we also need to add some project on teamCity:

feature gate option switch
GPU ON + CUDA/CUDNN version cuda7.5_cudnn5, cuda8.0_cudnn5, cuda8.0_cudnn7
GPU OFF + AVX AVX ON/OFF (Only use default ON)
GPU OFF + cblas MKL/OpenBlas (Only use one as default)
android build for androind
luotao1 commented 6 years ago

How much time of TeamCity if we add all above projects? @Yancey1989 And should we enhance the TeamCity agent at first? Currently, we only have 3 agents. @helinwang https://www.jetbrains.com/teamcity/buy/#license-type=new-license

Yancey1989 commented 6 years ago

Hi @luotao1 If we don't enought agents on TeamCity, maybe we can make all project except PR Test building on midnight. But I think enhance the TeamCity agent is very important, there are often more than 10 tasks waitting in the Queue.

luotao1 commented 6 years ago

However, our midnight is the daytime of American colleagues. If we make all project building on midnight, do they wait a lot of time?

Yancey1989 commented 6 years ago

Yep, I miss this question, how about disperse the builting time, make sure all project to run once a day?

luotao1 commented 6 years ago

how about disperse the builting time, make sure all project to run once a day?

I agree with you!

Yancey1989 commented 6 years ago

Update:

  1. Use schedule trigger instead of CSV trigger in all project except PR CI.
  2. Build different version whl Python package on TeamCity https://paddleci.ngrok.io/project.html?projectId=Manylinux1&tab=projectOverview.
helinwang commented 6 years ago

How many extra projects are we planning to build at midnight Beijing time? If there are within 6 projects, and each one takes 30min, it only takes an hour on three machines, it's probably fine.

Yancey1989 commented 6 years ago

@helinwang There are 15 configurations for totally except the PR CI, I have already configured them as schedule trigger at different time.

Yancey1989 commented 6 years ago

Update: limiting the number of different version of whl package, because the community edition of TeamCity limits the number of congurations.

TeamCity Configuration WITH_AVX WITH_GPU WITH_MKL Docker Image cp27-cp27mu cp27-cp27m C-API
cpu_avx_mkl ON OFF ON paddle:latest paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl paddle.tgz
cpu_avx_openblas ON OFF ON paddle:latest-openblas paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl None
cuda7.5_cudnn5_avx_mkl ON ON ON None paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl paddle.tgz
cuda8.0_cudnn5_avx_mkl ON ON ON None paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl paddle.tgz
cuda8.0_cudnn7_avx_mkl ON ON ON paddle:latest-gpu paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl paddle.tgz
typhoonzero commented 6 years ago

Also add corresponding capi links?

Yancey1989 commented 6 years ago

@typhoonzero will do that :)

Yancey1989 commented 6 years ago

@typhoonzero add C-API links.