cms-sw / cmsdist

CMS Offline Software build configuration
Other
27 stars 183 forks source link

Missing includes / libs in py2-numpy toolfile #2994

Closed riga closed 7 years ago

riga commented 7 years ago

Hi!

I'm currently trying to write a CMSSW module (slc6_amd64_gcc530, CMSSW_8_0_26_patch2) that uses the Python and NumPy C API's to load and evaluate Tensorflow graphs (https://gitlab.cern.ch/mrieger/CMSSW-DNN). The "Python.h" header is included properly, but it seems like "numpy/arrayobject.h" could not be found at compile time.

I added the tools "python" and "py2-numpy" to my BuildFile.xml and could debug with "scram b -d" that python is used (-I/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/python/2.7.11-ikhhed2/include/python2.7) but numpy is missing. After digging deeper I saw that the py2-numpy-toolfile.spec does not define any <environment name="INCLUDE" default="..." /> tag, unlike (e.g.) the python tool.

The numpy header files are located within the python module directory, e.g. here:

/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-numpy/1.11.1-ikhhed2/lib/python2.7/site-packages/numpy-1.11.1-py2.7-linux-x86_64.egg/numpy/core/include

The libraries seem to be at the same location.

I can try to provide a PR with an update tool file, but I'm not sure which branch to chose to start with. Is it possible to update the configuration of a tool in the BuildFile.xml of my module for testing purposes?

Thanks for your help! I'm really looking forward to use our DNNs in CMSSW =)

PS: As it would be much simpler to directly use the tensorflow C++ API, would it also be possible to provide the tensorflow include files / libraries?

cmsbuild commented 7 years ago

A new Issue was created by @riga Marcel R..

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

davidlt commented 7 years ago

I don't think we ever expected people to build Python packages at low-level within CMSSW. This is way our toolfiles only contain information only needed to use Python packages.

I would probably suggest adding another tool into py2-numpy-toolfile.spec maybe py2-numpy-devel.xml or py2-numpy-c-api.xml, which would define libraries and include directories. That one could be used for building packages which depend on these low-level APIs.

@davidlange6 I think, you were working on C++ API for tensorflow.

davidlange6 commented 7 years ago

On May 2, 2017, at 3:19 PM, davidlt notifications@github.com wrote:

I don't think we ever expected people to build Python packages at low-level within CMSSW. This is way our toolfiles only contain information only needed to use Python packages.

I would probably suggest adding another tool into py2-numpy-toolfile.spec maybe py2-numpy-devel.xml or py2-numpy-c-api.xml, which would define libraries and include directories. That one could be used for building packages which depend on these low-level APIs.

@davidlange6 I think, you were working on C++ API for tensorflow.

not me, though its on my to do list - @mharrend was (and succeeded I believe). Have a look here: https://github.com/cms-sw/cmsdist/pull/2824

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlt commented 7 years ago

My guess is py2-tensorflow.spec contains tensorflow + Python bindings. Do you know why we didn't build tensorflow.spec and then py2-tensorflow.spec (only Python bindings)?

davidlange6 commented 7 years ago

tensorflow is coming via a wheel, so indeed, maybe the c++ bindings are there too and I just didn't manage to access them..

On May 2, 2017, at 3:41 PM, davidlt notifications@github.com wrote:

My guess is py2-tensorflow.spec contains tensorflow + Python bindings. Do you know why we didn't build tensorflow.spec and then py2-tensorflow.spec (only Python bindings)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

riga commented 7 years ago

Thanks for your comments!

I would probably suggest adding another tool into py2-numpy-toolfile.spec maybe py2-numpy-devel.xml or py2-numpy-c-api.xml, which would define libraries and include directories. That one could be used for building packages which depend on these low-level APIs.

@davidlt Sounds good. Just tell me If I can provide or test something, I'm glad to help.

not me, though its on my to do list - @mharrend was (and succeeded I believe). Have a look here:

2824

@davidlange6 Yep, I'm currently using his bundle for 80X ;) The wheel comes with C++ headers and libraries, however, I haven't yet managed to access them via editing my BuildFile. But this should be possible via a tool file, right?

davidlt commented 7 years ago

I looked at https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/stage and tried to check for headers -- found none to be part of our py2-tensorflow package. Could you double check?

Btw, could you provide some information why you need C or C++ API (e.g. some run-time integration for CMSSW) or just because you want to use C++?

riga commented 7 years ago

Yeah, apparently model building is not yet supported by the C++ API, which is why there are no headers containing the ops. As far as I see, the graph evaluation should already work in C++, which requires the headers in tensorflow/include/tensorflow/core/public, e.g. session.h.

On analysis level and for network training we use the Python API, and in general, there is no reason not to ;) But for model evaluation on an event-by-event basis right within a CMSSW module (e.g. for electron MVAs, b-tagging, etc), the C++ API would be very helpful.

davidlt commented 7 years ago

That's exactly what I wanted to hear. If we reached the point where we want to integrate model evaluation run-time into CMSSW that needs C or/and C++ API.

For that we would need to build a tensorflow as a normal package inside CMSSW.

riga commented 7 years ago

For that we would need to build a tensorflow as a normal package inside CMSSW.

That would be awesome. I know many groups within CMS that would really appreciate a working TF interface (including us, of course).

In the meantime, in combination with the approach mentioned above, this additional py2-numpy-devel tool file would already do the trick.

davidlt commented 7 years ago

The following should work for you: https://github.com/cms-sw/cmsdist/pull/2998

IIRC, there are no explicit libraries we need to link to.

riga commented 7 years ago

Thanks @davidlt for providing this so fast!

I'm not an expert...how can I use that file? Or will it be available at some point in the scram tool list?

davidlt commented 7 years ago

Once merged it will be available in IBs, which are built twice a day. If you want to use it now:

  1. Create a file py2-numpy-c-api.xml
  2. Add the following content:
    <tool name="py2-numpy-c-api" version="1.12.1-cms3">
    <client>
    <environment name="PY2_NUMPY_C_API_BASE" default="/mnt/build/davidlt/numpy/a/slc6_amd64_gcc530/external/py2-numpy/1.12.1-cms3"/>
    <environment name="INCLUDE" default="$PY2_NUMPY_C_API_BASE/lib/python2.7/site-packages/numpy-1.12.1-py2.7-linux-x86_64.egg/numpy/core/include"/>
    </client>
    <use name="python"/>
    </tool>

    Note, you probably will need to adjust the paths.

  3. scram setup /path/to/py2-numpy-c-api.xml + cmsenv

You can now use py2-numpy-c-api in BuildFile.xml.

riga commented 7 years ago

Works like a charm, thanks!

davidlt commented 7 years ago

I was thinking about tensorflow integration into CMSSW. I think, we could do a dirty and quick integration similar to what we do for Oracle Instant Client and CUDA.

We can grab official binary builds and put that into CMSSW.

$ tar xvzf libtensorflow-cpu-linux-x86_64-1.1.0.tar.gz
./
./include/
./include/tensorflow/
./include/tensorflow/c/
./include/tensorflow/c/c_api.h
./lib/
./lib/libtensorflow.so
./include/tensorflow/c/LICENSE
$ ldd ./lib/libtensorflow.so
    linux-vdso.so.1 (0x00007fff0f9ca000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f489ea2e000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f489e82a000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f489e521000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f489e199000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f489df82000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f489dbba000)
    /lib64/ld-linux-x86-64.so.2 (0x000055dd55c99000)

This would provide you C API, but not C++.

The only issue I see is that it's a fat binary (incl. all needed externals at specific version), thus 90MB. Symbols are not hidden/internal, e.g. from jemalloc: je_arena_cleanup. Only C (and later C++ API) should be globally visible, the rest should be hidden.

davidlt commented 7 years ago

I looked into tensorflow repository and found 2 issues already created in the last 10 days about symbol visibility. This is actually causing issues in real life. They think it's easy to resolve, but someone needs to do it.

I also asked about binary distributions for C++ API, which they don't provide at this point.

The only issue could be that C++11 ABI.

davidlange6 commented 7 years ago

Hi @davidlt - This seems ok until we have a tensorflow tool inside a bigger application (which will be one week after having a tensorflow tool in cmssw). or is it easy for us to hide these symbols?

as I mentioned to you yesterday I was looking at how to plug in our own externals, but that is so far non trivial.

On May 3, 2017, at 11:15 PM, davidlt notifications@github.com wrote:

I was thinking about tensorflow integration into CMSSW. I think, we could do a dirty and quick integration similar to what we do for Oracle Instant Client and CUDA.

We can grab official binary builds and put that into CMSSW.

$ tar xvzf libtensorflow-cpu-linux-x86_64-1.1.0.tar.gz ./ ./include/ ./include/tensorflow/ ./include/tensorflow/c/ ./include/tensorflow/c/c_api.h ./lib/ ./lib/libtensorflow.so ./include/tensorflow/c/LICENSE $ ldd ./lib/libtensorflow.so linux-vdso.so.1 (0x00007fff0f9ca000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f489ea2e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f489e82a000) libm.so.6 => /lib64/libm.so.6 (0x00007f489e521000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f489e199000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f489df82000) libc.so.6 => /lib64/libc.so.6 (0x00007f489dbba000) /lib64/ld-linux-x86-64.so.2 (0x000055dd55c99000)

This would provide you C API, but not C++.

The only issue I see is that it's a fat binary (incl. all needed externals at specific version), thus 90MB. Symbols are not hidden/internal, e.g. from jemalloc: je_arena_cleanup. Only C (and later C++ API) should be globally visible, the rest should be hidden.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlt commented 7 years ago

It is an easy one and someone is already working on this (seems unsuccessfully for now). There are a couple of ways to do it. The easiest one and it's good for C libraries are symbol version scripts. That's what is being worked on. This also brings ability to add version symbols to tensorflow API.

It works less good in C++ application where compiler can generate gazillions of symbols (e.g. via templates). For that it would be better to use explicit attributes in code.

I think, it's much more easier to work with upstream community / Google to ensure that binary distributions are properly built and just take them.

IIUC, you don't want to use Eigen, jemalloc, etc from CMSSW while building tensorflow as it requires particular versions of these packages. It seems to be tightly integrated. Thus inside SPEC file you would be building the same binary distribution as you could get from upstream community / Google.

That binary distributions also by default works for everything (CMS, ATLAS, Ubuntu, etc.)

I found tensorflow SPEC in copr repository (third-party, random user): http://copr-dist-git.fedorainfracloud.org/cgit/alonid/tensorflow/tensorflow.git/tree/tensorflow.spec?id=89062de403c2264c08c3559597a2dd878516e456

davidlange6 commented 7 years ago

On May 4, 2017, at 1:18 PM, davidlt notifications@github.com wrote:

It is an easy one and someone is already working on this (seems unsuccessfully for now). There are a couple of ways to do it. The easiest one and it's good for C libraries are symbol version scripts. That's what is being worked on. This also brings ability to add version symbols to tensorflow API.

It works less good in C++ application where compiler can generate gazillions of symbols (e.g. via templates). For that it would be better to use explicit attributes in code.

I think, it's much more easier to work with upstream community / Google to ensure that binary distributions are properly built and just take them.

IIUC, you don't want to use Eigen, jemalloc, etc from CMSSW while building tensorflow as it requires particular versions of these packages. It seems to be tightly integrated. Thus inside SPEC file you would be building the same binary distribution as you could get from upstream community / Google.

That binary distributions also by default works for everything (CMS, ATLAS, Ubuntu, etc.)

I found tensorflow SPEC in copr repository (third-party, random user): http://copr-dist-git.fedorainfracloud.org/cgit/alonid/tensorflow/tensorflow.git/tree/tensorflow.spec?id=89062de403c2264c08c3559597a2dd878516e456

Thanks - I have basically this same recipe aside from having to change where tmp files go (defaulting to afs where quota fills quickly) but have not gotten to the end.

so ok, I was going the wrong direction with regards to the externals inside tensorflow. I'll stop heading in the direction of removing them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 commented 7 years ago

Hi @davidlt - does libtensorflow.so end up being the same thing as

/cvmfs/cms.cern.ch/slc7_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-mlhled3/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so

? (minus a python api perhaps)

On May 3, 2017, at 11:15 PM, davidlt notifications@github.com wrote:

I was thinking about tensorflow integration into CMSSW. I think, we could do a dirty and quick integration similar to what we do for Oracle Instant Client and CUDA.

We can grab official binary builds and put that into CMSSW.

$ tar xvzf libtensorflow-cpu-linux-x86_64-1.1.0.tar.gz ./ ./include/ ./include/tensorflow/ ./include/tensorflow/c/ ./include/tensorflow/c/c_api.h ./lib/ ./lib/libtensorflow.so ./include/tensorflow/c/LICENSE $ ldd ./lib/libtensorflow.so linux-vdso.so.1 (0x00007fff0f9ca000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f489ea2e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f489e82a000) libm.so.6 => /lib64/libm.so.6 (0x00007f489e521000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f489e199000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f489df82000) libc.so.6 => /lib64/libc.so.6 (0x00007f489dbba000) /lib64/ld-linux-x86-64.so.2 (0x000055dd55c99000)

This would provide you C API, but not C++.

The only issue I see is that it's a fat binary (incl. all needed externals at specific version), thus 90MB. Symbols are not hidden/internal, e.g. from jemalloc: je_arena_cleanup. Only C (and later C++ API) should be globally visible, the rest should be hidden.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 commented 7 years ago

I found tensorflow SPEC in copr repository (third-party, random user): http://copr-dist-git.fedorainfracloud.org/cgit/alonid/tensorflow/tensorflow.git/tree/tensorflow.spec?id=89062de403c2264c08c3559597a2dd878516e456

Thanks - I have basically this same recipe aside from having to change where tmp files go (defaulting to afs where quota fills quickly) but have not gotten to the end.

but to clarify, this is a spec for tensorflow+python api. we already have the equivalent in CMSSW (just not built with our compiler/python) - this doesn't bring the C (or C++) interface. But maybe that amounts to one file.

mharrend commented 7 years ago

By the way, here is the recipe how I built the C++ Tensorflow API for CMSSW8, just in case you would like to build the C++ API as a CMSSW module.

export VO_CMS_SW_DIR=/cvmfs/cms.cern.ch 
source $VO_CMS_SW_DIR/cmsset_default.sh
# setup environment
export SCRAM_ARCH="slc6_amd64_gcc530"
export CMSSW_VERSION="CMSSW_8_0_26_patch1"
# Variables
export JENKINSCMSSWFOLDER=$CMSSW_VERSION
echo $JENKINSCMSSWFOLDER
export JENKINSCMSSWINSTALLDIR=$PWD"/"$JENKINSCMSSWFOLDER
echo $JENKINSCMSSWINSTALLDIR
export JENKINSCMSSWSRCDIR=$JENKINSCMSSWINSTALLDIR"/src"
echo $JENKINSCMSSWSRCDIR
# create new CMSSW environment and delete old one
echo "Deleting old CMSSW install dir: " $JENKINSCMSSWINSTALLDIR
rm -rf $JENKINSCMSSWINSTALLDIR
echo "Creating new CMSSW install dir"
scram project $JENKINSCMSSWFOLDER
cd $JENKINSCMSSWSRCDIR
eval `scramv1 runtime -sh` 
# Setup c++ library paths and settings
scram setup gcc-cxxcompiler
export PATH="/cvmfs/cms.cern.ch/share/overrides/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw-patch/CMSSW_8_0_26_patch1/bin/slc6_amd64_gcc530:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw-patch/CMSSW_8_0_26_patch1/external/slc6_amd64_gcc530/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_26/bin/slc6_amd64_gcc530:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/llvm/3.8.0-ikhhed/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin:/afs/cern.ch/cms/caf/scripts:/cvmfs/cms.cern.ch/common:/cvmfs/cms.cern.ch/bin:/usr/sue/sbin:/usr/sue/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin":$PATH
export LD_LIBRARY_PATH=/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw-patch/CMSSW_8_0_26_patch1/biglib/slc6_amd64_gcc530:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw-patch/CMSSW_8_0_26_patch1/lib/slc6_amd64_gcc530:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw-patch/CMSSW_8_0_26_patch1/external/slc6_amd64_gcc530/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_26/biglib/slc6_amd64_gcc530:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_26/lib/slc6_amd64_gcc530:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/llvm/3.8.0-ikhhed/lib64:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/lib64:/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=$LD_LIBRARY_PATH:$LIBRARY_PATH
export LDFLAGS="-Wl,-rpath,/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/lib64 -L /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/lib64 -Wl,-rpath,/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/lib -L /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/lib"
export CXX=/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/gcc/5.3.0/bin/g++
# Compiling bazel
echo "Start compiling bazel"
wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist.zip
unzip bazel-0.4.5-dist.zip 
chmod +x ./compile.sh
./compile.sh
echo "Finished compiling bazel"
# Compiling Tensorflow
echo "Start compiling Tensorflow"
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
# Make sure Tensorflow configure runs non-interactive by setting variables
export PYTHON_BIN_PATH=/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw-patch/CMSSW_8_0_26_patch1/external/slc6_amd64_gcc530/bin/python
export TF_NEED_MKL=0
export CC_OPT_FLAGS="-march=native"
export TF_NEED_JEMALLOC=0
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_OPENCL=0
export TF_NEED_CUDA=0
# Configure Tensorflow
echo "Configure Tensorflow"
./configure 
# Patch Protobuf in tensorflow to make sure that non-default gcc installation is used
echo "Patching protobuf"
wget https://raw.githubusercontent.com/mharrend/tensorflow-c--api/master/protobuf-patch.bzl
patch -p1 -i protobuf-patch.bzl
# Compile Tensorflow C++ library
echo "Compiling Tensorflow C++ library"
$JENKINSCMSSWSRCDIR/output/bazel build -s -c opt //tensorflow:libtensorflow_cc.so
cp bazel-bin/tensorflow/libtensorflow_cc.so $JENKINSCMSSWSRCDIR/libtensorflow_cc.so
# Compile Tensorflow PIP package
echo "Compiling Tensorflow PIP package"
$JENKINSCMSSWSRCDIR/output/bazel build -s -c opt //tensorflow/tools/pip_package:build_pip_package
echo "Install PIP dependencies"
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py --user
echo "Build PIP package"
bazel-bin/tensorflow/tools/pip_package/build_pip_package $JENKINSCMSSWSRCDIR/tensorflow_pkg
davidlt commented 7 years ago

Using libtensorflow_cc.so (i.e. linking, loading into CMSSW) might not be secure (and it's not according to reports in tensorflow repository) until someone writes a proper version script, or adds visibility attributes.

riga commented 7 years ago

If, at some point, you want to test a possible solution, I'm glad to help out.

davidlt commented 7 years ago

At this point I am not sure if I want to jump into it. Seems that community already identified the same (or similar issues) and will try to resolve them, but it's not a priority for them at this point. I did start compiling tensorflow and playing around, but for now stopped. There are 3 main tasks:

  1. Identify all the needed headers for C++ interface (there are some discussions on that also on tensorflow issues)
  2. Make the library dual-abi to support GCC4-compatible and new C++11 ABI (default).
  3. Create a linker symbol version file to isolate interface functions (or change code with visibility attributes)

It is interesting, but would cost some time.

smuzaffar commented 7 years ago

@riga, can we close this issue?

riga commented 7 years ago

@smuzaffar Yep!