ctuning / ck-tensorflow

Collective Knowledge components for TensorFlow (code, data sets, models, packages, workflows):
http://cKnowledge.org
BSD 3-Clause "New" or "Revised" License
93 stars 26 forks source link

TensorFlow CPU and CPU-XLA map to the same environment on AArch64 platforms #49

Closed psyhtest closed 6 years ago

psyhtest commented 6 years ago

When installing TensorFlow 1.7.0 on two 64-bit Arm platforms (including Jetson TX1) with and without XLA, I noticed that CK registered them under the same environment UID (more precisely, updating the environment created during the first installation for the second installation):

anton@tegra-ubuntu:~$ ck install package:lib-tensorflow-1.7.0-src-cpu-xla --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=1
...
Installation time: 35946.103354 sec.
anton@tegra-ubuntu:~$ ck install package:lib-tensorflow-1.7.0-src-cpu --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=1 && date
...
Installation time: 21755.3534448 sec.
$ ck show env --tags=lib,tensorflow
Env UID:         Target OS: Bits: Name:                                  Version: Tags:

f17bebf9b6a4e759   linux-64    64 TensorFlow library (from sources, cpu) 1.7      64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,target-os-linux-64,tensorflow,tensorflow-cpu,v1,v1.7,v1.7.0,vcpu,vsrc

$CK_TOOLS does contain both versions (disregard __init__.py being both under src and lib):

$ ck detect soft:lib.tensorflow

  Searching for TensorFlow library (tensorflow/__init__.py) to automatically register in the CK - it may take some time, please wait ...

    * Searching in /usr ...
    * Searching in /opt ...
    * Searching in /home/anton/CK_TOOLS ...
    * Searching in /home/anton ...
...

  Registering software installations found on your machine in the CK:

    (HINT: enter -1 to force CK package installation)

    0) Version 1.7 - /home/anton/CK_TOOLS/lib-tensorflow-src-cpu-xla-1.7-linux-64/src/bazel-src/tensorflow/__init__.py
    1) Version 1.7 - /home/anton/CK_TOOLS/lib-tensorflow-src-cpu-xla-1.7-linux-64/lib/tensorflow/__init__.py
    2) Version 1.7 - /home/anton/CK_TOOLS/lib-tensorflow-src-cpu-1.7-linux-64/src/bazel-src/tensorflow/__init__.py
    3) Version 1.7 - /home/anton/CK_TOOLS/lib-tensorflow-src-cpu-1.7-linux-64/lib/tensorflow/__init__.py

but selecting the other version overwrites the same environment:

...
Environment entry updated (f17bebf9b6a4e759)!
  Successfully registered with UID: f17bebf9b6a4e759

anton@tegra-ubuntu:~$ ck show env --tags=lib,tensorflow
Env UID:         Target OS: Bits: Name:                                       Version: Tags:

f17bebf9b6a4e759   linux-64    64 TensorFlow library (from sources, cpu, xla) 1.7      64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,target-os-linux-64,tensorflow,tensorflow-cpu,v1,v1.7,v1.7.0,vcpu,vsrc,vxla
psyhtest commented 6 years ago

For some reason, it's only a problem on 64-bit Arm but not on Intel:

$ ck show env --tags=lib,tensorflow
Env UID:         Target OS: Bits: Name:                                        Version: Tags:

349151f1b286b2bd   linux-64    64 TensorFlow library (from sources, cuda, xla) 1.7      64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,needs-nvcc,needs-nvcc-9.0.176,needs-unknown_cudnn,needs-unknown_cudnn-7.0.5,target-os-linux-64,tensorflow,tensorflow-cuda,v1,v1.7,v1.7.0,vcuda,vsrc,vxla
f44c75af6bf2eac6   linux-64    64 TensorFlow library (from sources, cuda)      1.7      64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,needs-nvcc,needs-nvcc-9.0.176,needs-unknown_cudnn,needs-unknown_cudnn-7.0.5,target-os-linux-64,tensorflow,tensorflow-cuda,v1,v1.7,v1.7.0,vcuda,vsrc
aa786d72c6c86f8f   linux-64    64 TensorFlow library (from sources, cpu, xla)  1.7      64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,target-os-linux-64,tensorflow,tensorflow-cpu,v1,v1.7,v1.7.0,vcpu,vsrc,vxla
4e4d828328344965   linux-64    64 TensorFlow library (from sources, cpu)       1.7      64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,target-os-linux-64,tensorflow,tensorflow-cpu,v1,v1.7,v1.7.0,vcpu,vsrc
psyhtest commented 6 years ago

@ens-lg4 The TensorFlow 1.7 packages are simply copies of the TensorFlow 1.5 ones (with the version updated obviously). I believe you have also experimented with TensorFlow 1.6 packages made in a similar way on the same TX1 platform. Could you please check how many TensorFlow variants got registered with CK there (ck show env --tags=lib,tensorflow)?

psyhtest commented 6 years ago

ck detect soft:lib,tensorflow --full_path=<...> is a fast track to observing the environment update:

$ ck detect soft:lib.tensorflow --full_path=/home/anton/CK_TOOLS/lib-tensorflow-src-cpu-xla-1.7-linux-64/lib/tensorflow/__init__.py
$ ck detect soft:lib.tensorflow --full_path=/home/anton/CK_TOOLS/lib-tensorflow-src-cpu-1.7-linux-64/lib/tensorflow/__init__.py

Maybe you could even try it from a different account on the same machine.

gfursin commented 6 years ago

CK tries to reinstall package if it thinks that installation is the same. Installations are different by target/host OS and tags. Since XLA is not adding new tag, then CK doesn't know that it's a different installation. Solution is to set XLA via --env. and then add a specific tag via custom.py (that's what we do in armCL packages for OpenCL, vgraph, etc, if I am correct) ... Alternatively, I think there is also a way to provide --extra_tags from CMD (that's what flavio was doing) or --extra_version ...

gfursin commented 6 years ago

Yes, I figured out the problem with TF but added only a workaround since there is no obvious solution (one of these things which requires re-thinking and possible updates in the future new env/package manager, unless we find an easier solution). When installing a package, I search if related environment exists using host/target OS/bits, all tags, and dependencies. However tag search is inclusive, i.e. if you first install src,xls and record "vsrc,vxls" tags and then install just "src" version (that was your case), then CK will search just for "vsrc" and will find both "vsrc" and "vsrc,vxls" entries.

Here unfortunately I was not checking if there is more than 1 entry and was just using the first returned one, hence ambiguity and rewritten entries. My temporal solution is to print all related entries to a user and then stop with error, asking user to specify the correct one with --env_data_uoa flag when installing a package. This is not a nice solution but at least it will warn you instead of ambiguity. I committed it to ck-env (you need to update it to get this version).

Another solution that should work, is to add "no-xla" tag to non-xla packages. I can do it.

I also think I found yet another current solution - I now started recording package UID in env, so if I see ambiguity, I then prune selection by package UID, and only if there is still ambiguity after that (for example, if package was detected via detect soft so we do not know flags), then I will stop and ask user input ...

This should solve current problem. I committed and it seems to be working fine now , so we can postpone redesigning env detection mechanism, but we should still think about better ways to detect soft/env/package using version ranges, tags and possibly a list of environment vars (ON/OFF such as XLA, OpenCL, etc) ... This may be a topic for discussion at the workshop on reproducible workflows at SC'18...