Closed psyhtest closed 6 years ago
For some reason, it's only a problem on 64-bit Arm but not on Intel:
$ ck show env --tags=lib,tensorflow
Env UID: Target OS: Bits: Name: Version: Tags:
349151f1b286b2bd linux-64 64 TensorFlow library (from sources, cuda, xla) 1.7 64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,needs-nvcc,needs-nvcc-9.0.176,needs-unknown_cudnn,needs-unknown_cudnn-7.0.5,target-os-linux-64,tensorflow,tensorflow-cuda,v1,v1.7,v1.7.0,vcuda,vsrc,vxla
f44c75af6bf2eac6 linux-64 64 TensorFlow library (from sources, cuda) 1.7 64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,needs-nvcc,needs-nvcc-9.0.176,needs-unknown_cudnn,needs-unknown_cudnn-7.0.5,target-os-linux-64,tensorflow,tensorflow-cuda,v1,v1.7,v1.7.0,vcuda,vsrc
aa786d72c6c86f8f linux-64 64 TensorFlow library (from sources, cpu, xla) 1.7 64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,target-os-linux-64,tensorflow,tensorflow-cpu,v1,v1.7,v1.7.0,vcpu,vsrc,vxla
4e4d828328344965 linux-64 64 TensorFlow library (from sources, cpu) 1.7 64bits,bazel,channel-stable,host-os-linux-64,lib,needs-bazel,needs-bazel-0.11.1,target-os-linux-64,tensorflow,tensorflow-cpu,v1,v1.7,v1.7.0,vcpu,vsrc
@ens-lg4 The TensorFlow 1.7 packages are simply copies of the TensorFlow 1.5 ones (with the version updated obviously). I believe you have also experimented with TensorFlow 1.6 packages made in a similar way on the same TX1 platform. Could you please check how many TensorFlow variants got registered with CK there (ck show env --tags=lib,tensorflow
)?
ck detect soft:lib,tensorflow --full_path=<...>
is a fast track to observing the environment update:
$ ck detect soft:lib.tensorflow --full_path=/home/anton/CK_TOOLS/lib-tensorflow-src-cpu-xla-1.7-linux-64/lib/tensorflow/__init__.py
$ ck detect soft:lib.tensorflow --full_path=/home/anton/CK_TOOLS/lib-tensorflow-src-cpu-1.7-linux-64/lib/tensorflow/__init__.py
Maybe you could even try it from a different account on the same machine.
CK tries to reinstall package if it thinks that installation is the same. Installations are different by target/host OS and tags. Since XLA is not adding new tag, then CK doesn't know that it's a different installation. Solution is to set XLA via --env. and then add a specific tag via custom.py (that's what we do in armCL packages for OpenCL, vgraph, etc, if I am correct) ... Alternatively, I think there is also a way to provide --extra_tags from CMD (that's what flavio was doing) or --extra_version ...
Yes, I figured out the problem with TF but added only a workaround since there is no obvious solution (one of these things which requires re-thinking and possible updates in the future new env/package manager, unless we find an easier solution). When installing a package, I search if related environment exists using host/target OS/bits, all tags, and dependencies. However tag search is inclusive, i.e. if you first install src,xls and record "vsrc,vxls" tags and then install just "src" version (that was your case), then CK will search just for "vsrc" and will find both "vsrc" and "vsrc,vxls" entries.
Here unfortunately I was not checking if there is more than 1 entry and was just using the first returned one, hence ambiguity and rewritten entries. My temporal solution is to print all related entries to a user and then stop with error, asking user to specify the correct one with --env_data_uoa flag when installing a package. This is not a nice solution but at least it will warn you instead of ambiguity. I committed it to ck-env (you need to update it to get this version).
Another solution that should work, is to add "no-xla" tag to non-xla packages. I can do it.
I also think I found yet another current solution - I now started recording package UID in env, so if I see ambiguity, I then prune selection by package UID, and only if there is still ambiguity after that (for example, if package was detected via detect soft so we do not know flags), then I will stop and ask user input ...
This should solve current problem. I committed and it seems to be working fine now , so we can postpone redesigning env detection mechanism, but we should still think about better ways to detect soft/env/package using version ranges, tags and possibly a list of environment vars (ON/OFF such as XLA, OpenCL, etc) ... This may be a topic for discussion at the workshop on reproducible workflows at SC'18...
When installing TensorFlow 1.7.0 on two 64-bit Arm platforms (including Jetson TX1) with and without XLA, I noticed that CK registered them under the same environment UID (more precisely, updating the environment created during the first installation for the second installation):
$CK_TOOLS
does contain both versions (disregard__init__.py
being both undersrc
andlib
):but selecting the other version overwrites the same environment: