Tensorflow 2.3 - Githubissues

marcelotrevisani commented 4 years ago

Hello folks,

Do you have any news regarding tensorflow 2.3? or a perspective when it might be available on the main channel?

marcelotrevisani commented 4 years ago

friendly ping on @katietz as he was the last person to modify the recipe :)

roebel commented 3 years ago

The packages for CPU only are available since quite a while - I wonder whether there is a problem with the packages for GPU? Are these to arrive or is GPU support dropped ?

Thanks

tensorflow                     2.3.0 eigen_py37h189e6a2_0  pkgs/main           
tensorflow                     2.3.0 eigen_py38h71ff20e_0  pkgs/main           
tensorflow                     2.3.0 mkl_py37h0481017_0  pkgs/main           
tensorflow                     2.3.0 mkl_py38hd53216f_0  pkgs/main

npanpaliya commented 3 years ago

Please have a look at this https://github.com/ContinuumIO/anaconda-issues/issues/11967#issuecomment-728692004.

0x1997 commented 3 years ago

Any updates on tensorflow 2.4? Is it also blocked on ContinuumIO/anaconda-issues#11967?

npanpaliya commented 3 years ago

@0x1997 The project we are working on (Open-CE as mentioned in one of the related threads by @jayfurmanek), is about to publish another release which includes conda recipe for TF 2.4 (both GPU and CPU). For TF's conda recipe, you can refer to https://github.com/open-ce/tensorflow-feedstock.

katietz commented 3 years ago

I updated to tensorflow 2.4.1 for linux-64. The rc binaries can be found in my private channel 'ktietz' for testing. I will continue on Windows and MacOS builds soon too.

katietz commented 3 years ago

As side-note. New version supports eigen, mkl, and gpu version for linux-64.

roebel commented 3 years ago

I installed from your channel and this seems to work for me with python 3.7. I just loaded tensorflow for the moment and had it report the visible devices. That worked fine. I will put it into regular use over the following days and let you know if I find anything.

Many thanks for the update!

roebel commented 3 years ago

It mostly works fine, but this is an issue

WARNING:tensorflow:AutoGraph could not transform <bound method PulseWaveTable._linear_lookup of <tensorflow.python.eager.function.TfMethodTarget object at 0x7f1f4d18c610>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'

I found this thread https://github.com/serge-sans-paille/gast/issues/53 explaining the problem is using gast=0.4.0 while tensorflow requires gast=0.3.3

Indeed the gast dependency for tensorflow 2.4.1 is still 0.3.3 https://libraries.io/pypi/tensorflow/2.4.1

while it appears you pinned it to

tensorflow-base 2.4.1 gpu_py39h29c2da4_0
----------------------------------------
file name   : tensorflow-base-2.4.1-gpu_py39h29c2da4_0.conda
name        : tensorflow-base
version     : 2.4.1
build       : gpu_py39h29c2da4_0
build number: 0
size        : 195.2 MB
license     : Apache 2.0
subdir      : linux-64
url         : https://repo.anaconda.com/pkgs/main/linux-64/tensorflow-base-2.4.1-gpu_py39h29c2da4_0.conda
md5         : aec0b7780731b25ecff1e146c646b518
timestamp   : 2021-03-01 09:39:26 UTC
dependencies: 
...
  - gast >=0.4.0,<0.4.1.0a0
...

jayfurmanek commented 3 years ago

Another problem here, and this is likely more of a problem with Anaconda cudatoolkit package, is XLA doesn't work on the gpu version.

A good test for this can be found here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/tutorials/jit_compile.ipynb

when running that, I get:

2021-03-11 21:50:04.976184: W tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:592] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda  or modifying $PATH can be used to set the location of ptxas
This message will only be logged once.
2021-03-11 21:50:06.579105: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:70] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-03-11 21:50:06.579157: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:71] Searched for CUDA in the following directories:
2021-03-11 21:50:06.579168: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74]   ./cuda_sdk_lib
2021-03-11 21:50:06.579176: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74]   /usr/local/cuda-10.1
2021-03-11 21:50:06.579183: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74]   .
2021-03-11 21:50:06.579191: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:76] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2021-03-11 21:50:06.582894: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:324] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2021-03-11 21:50:06.583354: I tensorflow/compiler/jit/xla_compilation_cache.cc:333] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2021-03-11 21:50:06.583775: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at xla_ops.cc:238 : Internal: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "jit_compile.py", line 42, in <module>
    train_mnist(images, labels)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2942, in __call__
    return graph_function._call_flat(
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call
    outputs = execute.execute(
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_train_mnist_204]

There are two changes that could be made to the cudatoolkit package to fix this:

the libdevice.10.bc package is in the wrong location. It's shipped in $CONDA_HOME/lib when it should probably be in $CONDA_HOME/lib64 or $CONDA_HOME/nvvm/libdevice/ or $CONDA_HOME/nvvmx/libdevice/ (or all three)
the ptxas binary, which is used by the XLA compiler, is not included in the package at all and could be dropped into $CONDA_HOME/bin

I could put up a PR against your cudatoolkit feedstock with these changes if it would be considered.

katietz commented 3 years ago

Sure, a PR would be welcome!

About the gast version. I added hotfix for it, so that all tensorflow 2.4.1 version will have gast 0.3.3 as dependency. Hotfix just needs to be reviewed internally.

andrewsali commented 3 years ago

@katietz any update on the gast 0.3.3 issue? It still seems that 0.4.0 is the dependency for TF 2.4.1

katietz commented 3 years ago

I made a hotpatch for it, and gast should be by this using 0.3.3. The recipe isn't touched for now.

andrewsali commented 3 years ago

Thanks @katietz , is there anything that needs to be done on the client (install side) to consume this repodata hotpatch?

Currently when trying to install tensorflow==2.4.1 and gast==0.3.3 together, getting an error:

Package gast conflicts for:
gast==0.3.3
tensorflow==2.4.1 -> tensorflow-base==2.4.1=gpu_py37h29c2da4_0 -> gast[version='>=0.4.0,<0.4.1.0a0']

roebel commented 3 years ago

@katietz I don't quite know what to make out of this. It still does not install correctly.I think the only way to handle this currently is install tf2.4 and then post install gast 0.3.3 with pip. Is this the intended procedure?

roebel commented 3 years ago

@katietz I don't quite know what to make out of this. It still does not install correctly.I think the only way to handle this currently is install tf2.4 and then post install gast 0.3.3 with pip and the --user flag. Is this the intended procedure?

AnacondaRecipes / tensorflow_recipes

Tensorflow 2.3 #24