allenai / unified-io-2

Apache License 2.0
572 stars 27 forks source link

Tensorflow version conflict with array_ops.stack #8

Closed hytel closed 10 months ago

hytel commented 10 months ago

When using the provided Jupyter notebook, there is a point when it calls the:

resize_and_pad_default from t5x.examples.unified_io.data.data_utils module

This in turn can invoke the "pad_to_bounding_box_internal" function which uses the TensorFlow array_ops.stack as well as other operations. None of these seem to be supported in modern versions of TensorFlow. In the setup.py it mentions that Tensorflow 2.15.0 is desired. But this version doesn't support the array_ops.stack API.

How can I get past this?

chrisc36 commented 10 months ago

Hi, the tensorflow version should be tensorflow==2.11.1:

https://github.com/allenai/unified-io-2/blob/b29921082e684f53d858473c4aac3b25abe88bb2/setup.py#L59

Try downgrading the installation to that version.

hytel commented 10 months ago

I meant Tensorflow 2.11.1. Still, I believe that array_ops stack method dates back to the earliest days of Tensorflow and has dramatically changed since. Current documentation to the 2.x series doesn't mention this call as used as far as I can tell. Maybe it doesn't execute this line for you?

chrisc36 commented 10 months ago

Hmm, for me array_ops.stack seems to be working fine. This code:

import tensorflow as tf
print(tf.__version__)
from tensorflow.python.ops import array_ops
print(array_ops.stack)
print(array_ops.stack([0, 0, 1, 1, 2, 2, 0, 0]))

prints:

2.11.1
<function stack at 0x7fdea023f160>
tf.Tensor([0 0 1 1 2 2 0 0], shape=(8,), dtype=int32)

Is that not working for you? If so there must be some difference in the installed tensorflow library.

hytel commented 10 months ago

Dang, your right. But when I look online at API docs for version 2.11.1:

https://www.tensorflow.org/versions/r2.11/api_docs/python/tf

All I see are raw_ops. I don't even see python/ops. Probably looking at the wrong API guide. Honestly, installing, upgrading, and getting GPU support is so much easier in PyTorch that I hardly ever use Tensorflow. I have replicated about 30-40 github PyTorch projects effortlessly and then I end up punting on the majority of the 3-5 Tensorflow projects I tried because it's like pulling teeth to get a valid Tensorflow environment with GPU support and use it in conjunction with a particular Github project with its many dependencies. It's even worse in the Windows world. There are also fewer people to reach out to because most of the projects use PyTorch these days.

In any event, 2.11.1 does allow you to pull out tf.python.ops.array_ops so it should work in the data_utils.py module. Thanks for setting me straight!

chrisc36 commented 10 months ago

No problem, we generally use pytorch as well but we have found tensorflow datasets to be very helpful well dealing with all these different large dataset.