Open jsnowacki opened 5 years ago
Changing the versions with the below values seems to fix the error:
pip install tensor2tensor==1.13.4 tensorflow==1.14 tensorflow-serving-api==1.14.0rc0 gutenberg numpy==1.14.6
OK correction, it fixed just the local run; if you try to run Cloud ML Engine training via the command:
%%bash
GPU="--train_steps=7500 c --worker_gpu=1 --hparams_set=transformer_poetry"
DATADIR=gs://${BUCKET}/poetry/data
OUTDIR=gs://${BUCKET}/poetry/model
JOBNAME=poetry_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
echo "'Y'" | t2t-trainer \
--data_dir=gs://${BUCKET}/poetry/subset \
--t2t_usr_dir=./poetry/trainer \
--problem=$PROBLEM \
--model=transformer \
--output_dir=$OUTDIR \
${GPU}
the same error gets thrown by to worker.
unfortunately, project gutenberg doesn't support Python 3. To fix the ML Engine run, could you try modifying setup.py to pin the versions of the libraries as above? If that works, please submit a pull-request with your changes.
I've fixed setup.py
section to:
%%writefile poetry/setup.py
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = [
'tensor2tensor==1.13.4',
'tensorflow==1.14',
'tensorflow-serving-api==1.14.0rc0',
'numpy==1.14.6'
]
setup(
name='poetry',
version='0.1',
author = 'Google',
author_email = 'training-feedback@cloud.google.com',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='Poetry Line Problem',
requires=[]
)
But I still get the following error on AI Platform:
2019-07-12 10:15:55.415651: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
WARNING: Logging before flag parsing goes to stderr.
W0712 10:15:57.498725 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0712 10:15:58.259337 139815414183680 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0712 10:16:00.313907 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/adafactor.py:27: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
W0712 10:16:00.314401 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/multistep_optimizer.py:32: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
W0712 10:16:00.326944 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/mesh_tensorflow/ops.py:4237: The name tf.train.CheckpointSaverListener is deprecated. Please use tf.estimator.CheckpointSaverListener instead.
W0712 10:16:00.327136 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/mesh_tensorflow/ops.py:4260: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.
W0712 10:16:00.361661 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/rl/gym_utils.py:219: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
W0712 10:16:00.398224 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/trainer_lib.py:109: The name tf.OptimizerOptions is deprecated. Please use tf.compat.v1.OptimizerOptions instead.
W0712 10:16:01.014122 139815414183680 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
W0712 10:16:01.014338 139815414183680 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
W0712 10:16:01.014474 139815414183680 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:33: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
I0712 10:16:01.014980 139815414183680 usr_dir.py:43] Importing user module trainer from path /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/09_sequence/poetry
W0712 10:16:01.015603 139815414183680 deprecation_wrapper.py:119] From /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/09_sequence/poetry/trainer/problem.py:10: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.
W0712 10:16:01.016242 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/hparams_lib.py:49: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.
W0712 10:16:01.199728 139815414183680 deprecation_wrapper.py:119] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/trainer_lib.py:780: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
I0712 10:16:01.452857 139815414183680 cloud_mlengine.py:337] Launching job transformer_poetry_line_problem_t2t_20190712_101601 with ML Engine spec:
{'jobId': 'transformer_poetry_line_problem_t2t_20190712_101601',
'labels': {'hparams': 'transformer_poetry',
'model': 'transformer',
'problem': 'poetry_line_problem'},
'trainingInput': {'args': ['--problem=poetry_line_problem',
'--log_step_count_steps=100',
'--worker_gpu=1',
'--dbgprofile=False',
'--worker_id=0',
'--registry_help=False',
'--xla_jit_level=-1',
'--decode_to_file=',
'--save_checkpoints_secs=0',
'--decode_hparams=',
'--wiki_revision_percent_identical_examples=0.04',
'--eval_early_stopping_metric_minimize=True',
'--use_cprofile_for_profiling=True',
'--log_dir=',
'--alsologtostderr=False',
'--logtostderr=False',
'--run_with_pdb=False',
'--tmp_dir=/tmp/t2t_datagen',
'--profile=False',
'--?=False',
'--run_with_profiling=False',
'--decode_from_file=',
'--output_dir=gs://sotrender-rd-cloud-training-demos-ml/poetry/model',
'--op_conversion_fallback_to_while_loop=False',
'--use_tpu_estimator=False',
'--worker_gpu_memory_fraction=0.95',
'--train_steps=7500',
'--ps_replicas=0',
'--pdb_post_mortem=False',
'--parsing_path=',
'--keep_checkpoint_every_n_hours=10000',
'--std_server_protocol=grpc',
'--v=0',
'--eval_early_stopping_metric=loss',
'--wiki_revision_num_train_shards=50',
'--wiki_revision_vocab_file=',
'--optionally_use_dist_strat=False',
'--intra_op_parallelism_threads=0',
'--gpu_order=',
'--showprefixforinfo=True',
'--tpu_num_shards=8',
'--test_srcdir=',
'--eval_throttle_seconds=600',
'--use_tpu=False',
'--eval_early_stopping_metric_delta=0.1',
'--worker_replicas=1',
'--eval_run_autoregressive=False',
'--ps_gpu=0',
'--hparams_set=transformer_poetry',
'--wiki_revision_introduce_errors=True',
'--local_eval_frequency=1000',
'--xla_compile=False',
'--only_check_args=False',
'--eval_timeout_mins=240',
'--inter_op_parallelism_threads=0',
'--generate_data=False',
'--wiki_revision_max_page_size_exp=26',
'--test_tmpdir=/tmp/absl_testing',
'--worker_job=/job:localhost',
'--wiki_revision_num_dev_shards=1',
'--eval_use_test_set=False',
'--iterations_per_loop=100',
'--test_random_seed=301',
'--model=transformer',
'--enable_graph_rewriter=False',
'--log_device_placement=False',
'--data_dir=gs://sotrender-rd-cloud-training-demos-ml/poetry/subset',
'--disable_ffmpeg=False',
'--sync=False',
'--keep_checkpoint_max=20',
'--xml_output_file=',
'--tfdbg=False',
'--wiki_revision_max_equal_to_diff_ratio=0.0',
'--master=',
'--wiki_revision_max_examples_per_shard=0',
'--wiki_revision_data_prefix=',
'--stderrthreshold=fatal',
'--verbosity=0',
'--timit_paths=',
'--eval_steps=100',
'--schedule=continuous_train_and_eval',
'--export_saved_model=False',
'--cloud_tpu_name=jupyter-tpu',
'--ps_job=/job:ps',
'--wiki_revision_revision_skip_factor=1.5',
'--helpxml=False',
'--decode_reference=',
'--hparams='],
'jobDir': 'gs://sotrender-rd-cloud-training-demos-ml/poetry/model',
'masterType': 'standard_p100',
'pythonModule': 'tensor2tensor.bin.t2t_trainer',
'pythonVersion': '3.5',
'region': 'us-central1',
'runtimeVersion': '1.13',
'scaleTier': 'CUSTOM'}}
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 33, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/usr/local/bin/t2t-trainer", line 28, in main
t2t_trainer.main(argv)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 388, in main
cloud_mlengine.launch()
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/cloud_mlengine.py", line 338, in launch
assert confirm()
AssertionError
Not sure what is it to be honest, but it may have something to do with the option 'runtimeVersion': '1.13'
in the job start command.
I've checked with T2T and the version is hard coded there: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/cloud_mlengine.py
Local training works fine.
The problem is related to
courses/machine_learning/deepdive/09_sequence/courses/machine_learning/deepdive/09_sequence
, which is used at Coursera's Sequence Models for Time Series and Natural Language Processing, part of Advanced Machine Learning with TensorFlow on Google Cloud Platform. When one reaches theTrain model locally on subset of data
part the below commad:throws an error:
On the other hand, trying to update tensorflow in the top setup cells as explained in the exception, another issue arises:
Also, as you on it, it'd be good IMO to bump the notebook's version of python up to 3, currently it's 2.