EBI-predocs / research-software

:computer: Instructions, tips and issue tracker for the software on the EBI computing cluster
6 stars 2 forks source link

Building tensorflow from source #55

Closed vals closed 8 years ago

vals commented 8 years ago

I want to be able to build TensorFlow from source on the cluster.

(Basically because I want to make models in GPflow, which requires the latest master of TF)

The first step for that is to install Bazel. This is described here: https://www.tensorflow.org/versions/master/get_started/os_setup.html#installing-from-sources

When I run the command to install Bazel

./bazel-0.2.2b-installer-linux-x86_64.sh --user

I end up with (after some version info)

## Build informations
   - [Build log](http://ci.bazel.io/job/Bazel/JAVA_VERSION=1.8,PLATFORM_NAME=linux-x86_64/492/)
   - [Commit](https://github.com/bazelbuild/bazel/commit/b74d9b5)
Uncompressing......
/homes/vale/.bazelrc already exists, moving it to /homes/vale/.bazelrc.bak.
/homes/vale/bin/bazel: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /homes/vale/bin/bazel)
/homes/vale/bin/bazel: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /homes/vale/bin/bazel)
/homes/vale/bin/bazel: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /homes/vale/bin/bazel)

How does one deal with these GLIBCXX issues?

Alternatively, does anyone know how to build the latest version of TensorFlow?

I'm 60% sure I haven't messed up some configuration from the recommended settings in this repo.

mschubert commented 8 years ago

I can only guess because I don't have access to your build env.

Is the bazel lib a binary install?

If so, it was most likely built with a newer version of glibc than the system you are trying to build tensorflow has.

An easy workaround might be to try on yoda instead of ebi, otherwise try building bazel from source first.

edit: just tried it out and it hard-codes system /usr/bin/gcc instead of the one in $PATH; installed a working copy in the prefix path, try TF with that

vals commented 8 years ago

Thanks,

When running the Bazel in the prefix path, I get these errors:

vale@ebi-001 /nfs/research2/teichmann/valentine/tensorflow
 $ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
WARNING: Output base '/nfs/gns/homes/vale/.cache/bazel/_bazel_vale/ce81b89c14bc8ee0a74df7f71e8253af' is on NFS. This may lead to surprising failures and undetermined behavior.
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
ERROR: /nfs/research2/teichmann/valentine/tensorflow/WORKSPACE:16:6: First argument of load() is a path, not a label. It should start with a single slash if it is an absolute path..
ERROR: /nfs/research2/teichmann/valentine/tensorflow/WORKSPACE:20:6: First argument of load() is a path, not a label. It should start with a single slash if it is an absolute path..
ERROR: WORKSPACE file could not be parsed.
ERROR: no such package 'external': Package 'external' contains errors.
INFO: Elapsed time: 0.560s
mschubert commented 8 years ago

There is https://github.com/bazelbuild/bazel/issues/846, either use git HEAD for TF or compile bazel<0.1.1 (before https://github.com/bazelbuild/bazel/commit/d21c2d6653a3d9bc3376bcb190ba0ac31f52195b) yourself with the patch in /nfs/research2/software/prefix/overlay/app-misc/bazel/bazel-0.1.1.ebuild

vals commented 8 years ago

It seems TensorFlow needs Bazel > 0.1.4 for this: http://stackoverflow.com/questions/34941620/unable-to-build-tensorflow-from-source-with-bazel-22nd-january-2016

Which version did you build? The Bazel version info is not so informative:

vale@ebi-001 /nfs/research2/teichmann/valentine/tensorflow
 $ bazel version
WARNING: Output base '/nfs/gns/homes/vale/.cache/bazel/_bazel_vale/ce81b89c14bc8ee0a74df7f71e8253af' is on NFS. This may lead to surprising failures and undetermined behavior.
Build label: head (@non-git)
Build target: bazel-out/local_linux-fastbuild/bin/src/main/java/bazel-main_deploy.jar
Build time: Fri May 6 20:04:30 2016 (1462565070)
Build timestamp: 1462565070
Build timestamp as int: 1462565070
mschubert commented 8 years ago

upgraded to 0.1.4, try again

vals commented 8 years ago

I think something went wrong when building that version of Bazel. Building TF gives

$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
WARNING: Output base '/nfs/gns/homes/vale/.cache/bazel/_bazel_vale/ce81b89c14bc8ee0a74df7f71e8253af' is on NFS. This may lead to surprising failures and undetermined behavior.
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
ERROR: Loading of target '@bazel_tools//tools/jdk:ijar' failed; build aborted: no such target '@bazel_tools//tools/jdk:ijar': target 'ijar' not declared in package 'tools/jdk' defined by /nfs/gns/homes/vale/.cache/bazel/_bazel_vale/ce81b89c14bc8ee0a74df7f71e8253af/external/bazel_tools/tools/jdk/BUILD.
ERROR: Loading failed; build aborted.
INFO: Elapsed time: 3.307s

It is pointed out here: tensorflow/tensorflow#124 that this can be due to the bazel build not being complete.

$ bazel test
WARNING: Output base '/nfs/gns/homes/vale/.cache/bazel/_bazel_vale/ce81b89c14bc8ee0a74df7f71e8253af' is on NFS. This may lead to surprising failures and undetermined behavior.
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
ERROR: Loading of target '@bazel_tools//tools/jdk:ijar' failed; build aborted: no such target '@bazel_tools//tools/jdk:ijar': target 'ijar' not declared in package 'tools/jdk' defined by /nfs/gns/homes/vale/.cache/bazel/_bazel_vale/ce81b89c14bc8ee0a74df7f71e8253af/external/bazel_tools/tools/jdk/BUILD.
ERROR: Loading failed; build aborted.
INFO: Elapsed time: 0.610s
ERROR: Couldn't start the build. Unable to run tests.
vals commented 8 years ago

I built my own Bazel, version 0.2.2b, and with that it looks like the build of TensorFlow succeeds

However, building the pip package from the output of the TensorFlow build fails:

vale@ebi-001 /nfs/research2/teichmann/valentine/tensorflow
 $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Sun May 8 13:38:23 BST 2016 : === Using tmpdir: /tmp/tmp.JWJeeMSKK1
cp: cannot stat 'bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/tensorflow': No such file or directory
cp: cannot stat 'bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/external': No such file or directory
vals commented 8 years ago

It turns out I didn't actually build 0.2.2b, but rather a development version with some changes in directory structures.

In the end, to build tensorflow, I did this:

$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ git checkout 759bbfe
$ ./compile.sh

$ conda install swig

$ cd ..
$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ ./configure

./../bazel/output/bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.8.0-py3-none-any.whl

Then tensorflow works!