Open marcelotrevisani opened 4 years ago
friendly ping on @katietz as he was the last person to modify the recipe :)
The packages for CPU only are available since quite a while - I wonder whether there is a problem with the packages for GPU? Are these to arrive or is GPU support dropped ?
Thanks
tensorflow 2.3.0 eigen_py37h189e6a2_0 pkgs/main
tensorflow 2.3.0 eigen_py38h71ff20e_0 pkgs/main
tensorflow 2.3.0 mkl_py37h0481017_0 pkgs/main
tensorflow 2.3.0 mkl_py38hd53216f_0 pkgs/main
Please have a look at this https://github.com/ContinuumIO/anaconda-issues/issues/11967#issuecomment-728692004.
Any updates on tensorflow 2.4? Is it also blocked on ContinuumIO/anaconda-issues#11967?
@0x1997 The project we are working on (Open-CE as mentioned in one of the related threads by @jayfurmanek), is about to publish another release which includes conda recipe for TF 2.4 (both GPU and CPU). For TF's conda recipe, you can refer to https://github.com/open-ce/tensorflow-feedstock.
I updated to tensorflow 2.4.1 for linux-64. The rc binaries can be found in my private channel 'ktietz' for testing. I will continue on Windows and MacOS builds soon too.
As side-note. New version supports eigen, mkl, and gpu version for linux-64.
I installed from your channel and this seems to work for me with python 3.7. I just loaded tensorflow for the moment and had it report the visible devices. That worked fine. I will put it into regular use over the following days and let you know if I find anything.
Many thanks for the update!
It mostly works fine, but this is an issue
WARNING:tensorflow:AutoGraph could not transform <bound method PulseWaveTable._linear_lookup of <tensorflow.python.eager.function.TfMethodTarget object at 0x7f1f4d18c610>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
I found this thread https://github.com/serge-sans-paille/gast/issues/53 explaining the problem is using gast=0.4.0 while tensorflow requires gast=0.3.3
Indeed the gast dependency for tensorflow 2.4.1 is still 0.3.3 https://libraries.io/pypi/tensorflow/2.4.1
while it appears you pinned it to
tensorflow-base 2.4.1 gpu_py39h29c2da4_0
----------------------------------------
file name : tensorflow-base-2.4.1-gpu_py39h29c2da4_0.conda
name : tensorflow-base
version : 2.4.1
build : gpu_py39h29c2da4_0
build number: 0
size : 195.2 MB
license : Apache 2.0
subdir : linux-64
url : https://repo.anaconda.com/pkgs/main/linux-64/tensorflow-base-2.4.1-gpu_py39h29c2da4_0.conda
md5 : aec0b7780731b25ecff1e146c646b518
timestamp : 2021-03-01 09:39:26 UTC
dependencies:
...
- gast >=0.4.0,<0.4.1.0a0
...
Another problem here, and this is likely more of a problem with Anaconda cudatoolkit
package, is XLA doesn't work on the gpu
version.
A good test for this can be found here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/tutorials/jit_compile.ipynb
when running that, I get:
2021-03-11 21:50:04.976184: W tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:592] Internal: ptxas exited with non-zero error code 256, output:
Relying on driver to perform ptx compilation.
Setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda or modifying $PATH can be used to set the location of ptxas
This message will only be logged once.
2021-03-11 21:50:06.579105: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:70] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-03-11 21:50:06.579157: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:71] Searched for CUDA in the following directories:
2021-03-11 21:50:06.579168: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] ./cuda_sdk_lib
2021-03-11 21:50:06.579176: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] /usr/local/cuda-10.1
2021-03-11 21:50:06.579183: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] .
2021-03-11 21:50:06.579191: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:76] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2021-03-11 21:50:06.582894: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:324] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2021-03-11 21:50:06.583354: I tensorflow/compiler/jit/xla_compilation_cache.cc:333] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
2021-03-11 21:50:06.583775: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at xla_ops.cc:238 : Internal: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
File "jit_compile.py", line 42, in <module>
train_mnist(images, labels)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2942, in __call__
return graph_function._call_flat(
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_train_mnist_204]
There are two changes that could be made to the cudatoolkit
package to fix this:
libdevice.10.bc
package is in the wrong location. It's shipped in $CONDA_HOME/lib
when it should probably be in $CONDA_HOME/lib64
or $CONDA_HOME/nvvm/libdevice/
or $CONDA_HOME/nvvmx/libdevice/
(or all three)ptxas
binary, which is used by the XLA compiler, is not included in the package at all and could be dropped into $CONDA_HOME/bin
I could put up a PR against your cudatoolkit
feedstock with these changes if it would be considered.
Sure, a PR would be welcome!
About the gast version. I added hotfix for it, so that all tensorflow 2.4.1 version will have gast 0.3.3 as dependency. Hotfix just needs to be reviewed internally.
@katietz any update on the gast 0.3.3 issue? It still seems that 0.4.0 is the dependency for TF 2.4.1
I made a hotpatch for it, and gast should be by this using 0.3.3. The recipe isn't touched for now.
Thanks @katietz , is there anything that needs to be done on the client (install side) to consume this repodata hotpatch?
Currently when trying to install tensorflow==2.4.1
and gast==0.3.3
together, getting an error:
Package gast conflicts for:
gast==0.3.3
tensorflow==2.4.1 -> tensorflow-base==2.4.1=gpu_py37h29c2da4_0 -> gast[version='>=0.4.0,<0.4.1.0a0']
@katietz I don't quite know what to make out of this. It still does not install correctly.I think the only way to handle this currently is install tf2.4 and then post install gast 0.3.3 with pip. Is this the intended procedure?
@katietz I don't quite know what to make out of this. It still does not install correctly.I think the only way to handle this currently is install tf2.4 and then post install gast 0.3.3 with pip and the --user flag. Is this the intended procedure?
Hello folks,
Do you have any news regarding tensorflow 2.3? or a perspective when it might be available on the main channel?