Open sshleifer opened 4 years ago
Hi, I experienced a similar issue before. This is probably because tensorflow-gpu
is installed (if you use pip install
) before tensorflow-text
(or other tensorflow-*
sorry I can't remember), which depends on CPU-version tensorflow
. I fixed this by installing tensorflow-gpu
at the last when the virtualenv is created. The requirements.txt
also lists tensorflow-gpu
at the last.
Hope this may help you fix.
I will update the instruction regarding the ckpt path.
made a new venv, ran pip install -r requirements.txt
, and unfortunately the behavior is identical.
What does your pip freeze | grep tensor
look like?
From pip freeze | grep tensor
:
mesh-tensorflow==0.1.13
tensor2tensor==1.15.0
tensorboard==1.15.0
tensorflow==1.15.2
tensorflow-datasets==3.0.0
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gpu==1.15.0
tensorflow-hub==0.8.0
tensorflow-metadata==0.21.2
tensorflow-probability==0.7.0
tensorflow-text==1.15.0rc0
Since both tensorflow
and tensorflow-gpu
are installed, so you probably need to make sure python
imports tensorflow-gpu
instead of tensorflow
.
Switched machines, cause I think tensorflow-gpu==1.15.0 requires cuda 10.0. That got me to a new error:
What is your pip freeze | grep tfds
?
I'm at tfds-nightly==1.0.1.dev201903050105
Preparing to unpack .../zsh_5.3.1-4+b3_amd64.deb ...
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "pegasus/bin/evaluate.py", line 144, in main
FLAGS.enable_logging)
File "/home/shleifer/pegasus/pegasus/eval/text_eval.py", line 153, in text_eval
for i, features in enumerate(features_iter):
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3078, in predict
rendezvous.raise_errors()
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
six.reraise(typ, value, traceback)
File "/opt/conda/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
yield_single_examples=yield_single_examples):
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 620, in predict
input_fn, ModeKeys.PREDICT)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 996, in _get_features_from_input_fn
result = self._call_input_fn(input_fn, mode)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2987, in _call_input_fn
return input_fn(**kwargs)
File "/home/shleifer/pegasus/pegasus/data/infeed.py", line 41, in input_fn
dataset = all_datasets.get_dataset(input_pattern, training)
File "/home/shleifer/pegasus/pegasus/data/all_datasets.py", line 52, in get_dataset
dataset, _ = builder.build(input_pattern, shuffle_files)
File "/home/shleifer/pegasus/pegasus/data/datasets.py", line 200, in build
dataset, num_examples = self.load(build_name, split, shuffle_files)
File "/home/shleifer/pegasus/pegasus/data/datasets.py", line 158, in load
data_dir=self.data_dir)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
TypeError: load() got an unexpected keyword argument 'shuffle_files'
My output: tfds-nightly==3.0.0.dev202004160105
BTW: the code was initially developed by python 3.6 but I think 3.7 should be fine.
I ran the setup instructions on a preixisting GCP machine with cuda 10.1 and one modification:
(Instructions don't work as written because they don't acknowledge the
pegasus_ckpt
subdirectory, or that you need to point--model_dir
to a specific checkpoint file, which is the only way I got evaluate.py to run.Then, I ran
and it is running on 8 CPU cores,
nvidia-smi
similarly shows 0 GPU utilization.How can I fix that?
Env: